Concatemers of differentially expressed multiple genes

ABSTRACT

In the present invention are disclosed concatemers of concatenated expression cassettes and vectors that enable the synthesis of such concatemers. The concatemer comprises in the 5′→3′ direction a cassette of nucleotide sequence of the general formula [rs2-SP—PR—X-TR—SP-rs1]n wherein rs1 and rs2 together denote a functional restriction site, SP individually denotes a spacer of at least two nucleotide bases, PR denotes a promoter, capable of functioning in a cell, X denotes an expressible nucleotide sequence, TR denotes a terminator, and SP individually denotes a spacer of at least two nucleotide bases, and n&gt;/=2, and wherein at least a first cassette is different from a second cassette. The main purpose of these concatemers is the controllable and co-ordinated expression of large numbers of heterologous genes in a single host. Furthermore, the invention relates to a concatemer of cassettes of nucleotide sequences and a method for preparing the concatemers. In a further aspect, the invention relates to transgenic host cells comprising at least one concatemer according to the invention, as well as to a method for preparing the transgenic host cells. Finally, the invention relates to a vector comprising a cassette of nucleotides, a method for preparing said vector, a nucleotide library comprising at least two primary vectors each comprising a cassette of nucleotides, a method for preparing the library.

[0001] This application is a nonprovisional of U.S. provisionalapplication Serial No. 60/301,022 filed 27. Jun. 2001, which is herebyincorporated by reference in its entirety. The application claimspriority from Danish patent application number PA 2001 00127 filed 25.Jan. 2001, which is hereby incorporated by reference in its entirety.All patent and nonpatent references cited in the application, or in thepresent application, are also hereby incorporated by reference in theirentirety.

[0002] In the present invention are disclosed concatemers ofconcatenated expression cassettes and vectors that enable the synthesisof such concatemers. The main purpose of these concatemers is thecontrollable and co-ordinated expression of large numbers ofheterologous genes in a single host cell. Furthermore, the inventionrelates to a concatemer of cassettes of nucleotide sequences and amethod for preparing the concatemers. In a further aspect, the inventionrelates to transgenic host cells comprising at least one concatemeraccording to the invention, as well as to a method for preparing thetransgenic host cells. Finally, the invention relates to a vectorcomprising a cassette of nucleotides, a method for preparing saidvector, a nucleotide library comprising at least two primary vectorseach comprising a cassette of nucleotides, a method for preparing thelibrary.

PRIOR ART

[0003] The design of expression constructs and expression libraries iswell known in the art

[0004] WO 96/34112 discloses a combinatorial gene expression librarywith a pool of expression constructs each construct containing a cDNA orgenomic DNA fragment from a plurality of donor organisms. The DNAfragments are operably associated with regulatory regions that driveexpression in a host cell. The publication also discloses acombinatorial gene expression library in which each cell comprises aconcatemer of cDNA fragments being operably associated with regulatoryregions to drive expression of the genes encoded by the concatenatedcDNA in a host organism. The host organism may be a yeast cell. Thevector used for constructing the library may be a plasmid vector, aphage, a viral vector, a cosmid vector or an artificial chromosome (BACor YAC). Suitable promoters include natural and synthetic promoters aswell as constitutive and inducible promoters.

[0005] The genes used for the the concatemers are prepared in a highlyordered multi-step procedure consisting of a number of discrete reactionsteps. First, cDNA inserts are prepared using PCR and methylated dCTP toprotect internal Not I and Bam HI restriction sites from laterdigestion. Promoter and terminator fragments are ligated to the 5′ and3′ ends respectively using modified Bam HI adapters. The gene cassetteshave the basic structure: promoter-coding sequence-terminator withdifferent restriction sites in each end. The restriction site in the 3′end is protected. Similar gene cassettes can be prepared from genomicDNA which is randomly fragmented using a restriction enzyme.

[0006] The concatemers disclosed in WO 96/34112 are prepared in a highlyordered multi-step procedure, where the first gene cassette is ligatedto an adapter nucleotide sequence linked to a bead having a blunt endcorresponding to the blunt 5′ end of the gene cassette. After ligation,the restriction site is no longer functional, since it was assembledusing two compatible but not identical restriction sites. After linkingthe first gene cassette to the bead, the protected restriction site inthe 3′ end is “opened” and the second gene cassette is linked to thefirst After 5 to 10 rounds of ligation of gene cassettes, the vector isligated to the 3′ end of the concatemer, the concatemer-vector constructis liberated from the bead and the 5′ end is ligated to the other end ofthe vector. It is emphasised that the 3′ and 5′ ends of the concatemershould be non-compatible to avoid self assembly during cloning into thevector.

[0007] Due to the plurality of discrete steps in the preparationprocedure, the method is not suitable for preparing concatemers ofsignificantly larger size. Once the gene cassettes have been cloned intothe vector, it is not possible to excise the cassettes or the completeconcatemer from the vector using a restriction enzyme.

[0008] U.S. Pat. No. 6,057,103 (Diversa) discloses a method foridentification of clones having a specified enzyme activity throughisolation of DNA from a microorganism and hybridisation with a probe DNAcomprising at least part of a sequence encoding an enzyme having aspecified activity. The identified sequences are linked to a promotersequence (e.g. eukaryotic promoters: CMV immediate/early, HSV thymidinekinase, early and late SV40, LTRs from retrovirus, and mousemetallothionein-I) and inserted into a vector, which may be a YAC or aP1 based artificial chromosome. Host cells used for transformation withthe identified nucleotide sequences include bacterial cells, yeast,insect cells, mammalian cells. The disclosed vectors are not adapted forcloning of high numbers of expressible nucleotide sequences, and thereference does not disclose the concatemers for cloning of multiplegenes into a host cell. More specifically the reference does not renderpossible the combination and screening of combinations of genes underconditions that allow genes to be combined in new ways.

[0009] U.S. Pat. No. 5,958,672 (Diversa) discloses a method foridentifying protein activity of interest by culturing a gene expressionlibrary obtained from uncultivated microorganisms (or lower eukaryoticspecies). The gene expression library may be obtained from genomic DNAor from cDNA. The DNA clones are controllably associated to regulatorysequences, which drive the expression of the sequences in the host cell.The libraries may be screened against a DNA probe as described in U.S.Pat. No. 6,054,267. As is the case in the other references cited above,the combination of genes isolated from the uncultivated microorganismsis a static combination. Once, the isolated genes are inserted into thehost cells, no further gene combinations and no further optimisation ofgene combinations are intended.

[0010] It is one objective of the present invention to provide methodsand vectors for cloning of large numbers of expressible nucleotidesequences, which are especially adapted for later random ligation intoconcatemers, which can in turn be inserted into an expression host forexpression of the genes in the concatemer or a sub-set of the genes orof different sub-sets of the genes in the concatemer. Thereby it becomespossible to optimise combinations of expressible nucleotide sequencesfor any screenable trait.

[0011] It is a further objective of the present invention to providemethods and vectors, which allow for the production of concatemers,which are flexible. By flexible in this context is meant, that the genecassettes of the concatemers can be disassembled and reassembled easilyusing standard molecular techniques, such as excision by the use ofrestriction enzymes.

SUMMARY OF THE INVENTION

[0012] According to a first aspect the invention relates to a nucleotideconcatemer comprising in the 5′→3′ direction a cassette of nucleotidesequence of the general formula

[rs₂-SP—PR—X-TR—SP-rs₁]_(n)

[0013] wherein

[0014] rs₁ and rs₂ together denote a restriction site,

[0015] SP individually denotes a spacer of at least two nucleotidebases,

[0016] PR denotes a promoter, capable of functioning in a cell,

[0017] X denotes an expressible nucleotide sequence,

[0018] TR denotes a terminator, and

[0019] SP individually denotes a spacer of at least two nucleotidebases, and

[0020] n≧2, and

[0021] wherein at least a first cassette is different from a secondcassette.

[0022] The concatemers according to the invention may comprise aselection of expressible nucleotide sequences from just one expressionstate and can thus be assembled from one library representing thisexpression state or it may comprise cassettes from a number of differentexpression states mixed in any suitable ratio. The concatemers accordingto the invention are especially suitable for ligating into an artificialchromosome, which may be inserted into a host cell for coordinatedexpression. For this purpose, the variation among and between cassettesmay be such as to minimise the chance of cross over as the host cellundergoes cell division such as through minimising the level of repeatsequences occurring in any one concatemer, since it is not an object ofthis embodiment of the invention to obtain recombination of concatemerswith a segment in the host genome or an epitope of the host cells or anyintraconcatemer recombination.

[0023] The concatemers can be used to make novel and non-nativecombinations of genes for coordinated expression in a host cell. Therebynew metabolic pathways can be generated, which may lead to theproduction of new metabolites, and/or to the metabolisation ofcompounds, which are otherwise not metabolisable by the host cells. Thenew gene combinations may also lead to metabolic pathways which producemetabolites in new quantities or in new compartments of the cell oroutside the cell. Depending on the purpose, the selection of genes canbe made completely random based on sourcing of expressible nucleotidesequences across the different kingdoms. However, it may also beadvantageous to source genes from sources known to have certainmetabolic pathways in order to make targeted new gene combinations. Itmay also be advantageous to source genes from organisms/tissues known tohave relevant properties, e.g. pharmaceutical activity.

[0024] One of several advantages of the concatemers of the presentinvention is that the expression cassettes can be cut out from theconcatemers at any point to make new combinations of expressioncassettes. During re-assembly, further genes comprised in similarexpression cassettes may be added if desired to modify the expressionpattern. In this way, the concatemers according to the present inventionpresent a powerful tool in generating novel gene combinations.

[0025] One advantage of the structure of the concatemer is thatcassettes can be recovered from the host cell through nucleotideisolation and subsequent digestion with a restriction enzyme specificfor the rs₁-rs₂ restriction site. The building blocks of the concatemersmay thus be disassembled and reassembled at any point.

[0026] The cassettes of the concatemer may be joined head to tail orhead to head or tail to tail, which does not affect expression of theexpressible nucleotide sequences because each expressible nucleotidesequence is under the control of its own promoter. This is due to thefact that most restriction enzymes leave two identical overhangs, whichmay combine in either orientation at the same frequency.

[0027] However the restriction sites can also be selected so that headto tail arrangement is favoured, for example by using restrictionenzymes that generate non-palindromic overhangs. Examples of suchenzymes are listed in example 6c and most of the enzymes in 6d.Non-palindromic overhangs will prevent head to head and tail to tailligation. By the use of two or more different entry vectors, thesequence of the cutting region can be designed to yield differentoverhangs after digestion with a single of these enzymes. Examples ofsuch enzymes are most of the enzymes in example 6c and one in 6d, andhave variable nucleotides (N or W) in the overhang. In this way acassette can be excised with one enzyme that has non-identicaloverhangs. This will prevent intramolecular religation and prevent thatidentical cassettes ligate to each other, decreasing the risk ofintramolecular recombination.

[0028] The invention in a further aspect relates to a method forconcatenation comprising the steps of concatenating at least twocassettes of nucleotide sequences each cassette comprising a firststicky end, a spacer sequence, a promoter, an expressible nucleotidesequence, a terminator, and a second sticky end.

[0029] Preferably, the method comprises starting from primary vectorscomprising a cassette having the following nucleotide sequence

[RS1-RS2-SP—PR—X-TR—SP—RS2′-RS1′],

[0030] wherein X denotes an expressible nucleotide sequence,

[0031] RS1 and RS1′ denote restriction sites,

[0032] RS2 and RS2′ denote restriction sites different from RS1 andRS1′,

[0033] SP individually denotes a spacer sequence of at least twonucleotides,

[0034] PR denotes a promoter,

[0035] TR denotes a terminator,

[0036] i) cutting the primary vector with the aid of at least onerestriction enzyme specific for RS2 and RS2′ obtaining cassettes havingthe general formula [rs₂-SP—PR—X-TR—SP-rs₁] wherein rs₁ and rs₂ togetherdenote a functional restriction site RS2 or RS2′,

[0037] ii) assembling the cut out cassettes through interaction betweenrs₁ and rs₂.

[0038] According to this embodiment, excision and concatenation iscarried out in a “one step” reaction, i.e. without an interveningpurification step, starting from vectors containing the expressioncassettes. The expression cassettes can be cut out using two restrictionenzymes specific for RS1 and RS2. Preferably for this one step reactionRS1 leaves blunt ends and RS2 leaves sticky ends. Upon addition of aligase, the concatemer can be assembled in the mixture without any needfor purification, since the vector backbone and the small RS1-RS2fragments do not interfere with the concatenation reaction.

[0039] In the case, where the concatemer is to be inserted into anartificial chromosome vector for later transformation into an expressionhost cell, the AC vector arms can be added directly to the concatenationreaction mixture, so that the complete artificial chromosome vectorcontaining the concatemer can be assembled in one step. By controllingthe ratio of vector arms to cassettes, the size of the concatemer can becontrolled. It is of course also possible to control the size of theconcatemers by adding stopper fragments, the stopper fragments eachhaving a RS2 or RS2′ in one end and a non-complementary overhang or ablunt end in the other end. Vector arms may also be added later on in aseparate step.

[0040] Advantageously, the method comprises addition of vector arms eachhaving a RS2 or RS2′ in one end and a non-complementary overhang or ablunt end in the other end. These can be added to the concatenationmixture, and even in this complex mixture, concatemers with one vectorarm in each end will be produced under appropriate conditions. Hostcells transformed with the desired construction, including theappropriate vector arms, can be selected by utilizing marker genespresent on the arms.

[0041] In one aspect the invention relates to a host cell, whichcomprises at least one concatemer of individual oligonucleotidecassettes, each concatemer comprising oligonucleotide of the followingformula in 5′→3′ direction: [rs₂-SP—PR—X-TR—SP-rs₁]_(n), wherein rs₁ andrs₂ together denote a restriction site, SP individually denotes a spacerof at least two nucleotide bases, PR denotes a promoter, capable offunctioning in the cell, X denotes an expressible nucleotide sequence,TR denotes a terminator, and SP individually denotes a spacer of atleast two nucleotide bases, wherein n≧2, and wherein at least twoexpressible nucleotide sequences are from different expression states.

[0042] In another aspect the invention relates to a cell comprising atleast one concatemer of individual oligonucleotide cassettes, eachconcatemer comprising oligonucleotide of the following formula in 5′→3′direction:

[rs₂-SP—PR—X-TR—SP-rs₁]_(n)

[0043] wherein

[0044] rs₁ and rs₂ together denote a restriction site,

[0045] SP individually denotes a spacer of at least two nucleotidebases,

[0046] PR denotes a promoter, capable of functioning in the cell,

[0047] X denotes an expressible nucleotide sequence,

[0048] TR denotes a terminator, and

[0049] SP individually denotes a spacer of at least two nucleotidebases,

[0050] wherein n≧2, and

[0051] wherein rs₁-rs₂ in at least two cassettes is recognised by thesame restriction enzyme.

[0052] Thereby non-naturally occurring combinations of expressible genescan be combined in one cell in such a way that coordinated expression ofa subset of genes is made possible or all the inserted genes may beexpressed at the same time. Through external regulation of the promoterscontrolling the expressible nucleotides sequences novel andnon-naturally occurring combinations of expressed genes can be obtained.Since these novel and non-natural combinations of gene products arefound in one and the same cell, the heterologous gene products mayaffect the metabolism of the host cell in novel ways and thus cause itto produce novel and/or non-native primary or secondary metabolitesand/or known metabolites in novel amounts and/or known metabolites innovel compartments of the cell or outside the cells. The novel metabolicpathways and/or novel or modified metabolites may be obtained withoutsubstantially recombining the introduced genes with any segment in thehost genome or any episome of the host cells.

[0053] The cells containing the concatemers, preferably in the form ofartificial chromosomes, may be used for directed evolution by subjectingpopulations of cells to selective conditions. One advantage of thestructure of the concatemers is that expression cassettes from differentcells or from different populations of cells can be combined easily in afew steps thereby increasing the potential of the evolution. When theconcatemers are inserted in the form of artificial chromosomes,evolution may also be carried out using traditional breeding andselection.

[0054] It is likely that through the combination of a high number ofnon-native genes in a host cell combinations of genes or single genesare inserted that are lethal or sub-lethal to the host cell. Through thecoordinated expression of the genes in the host cell it is possible notonly to induce the expression of any subset of genes but also to represssuch expression, e.g. of lethal or sub-lethal genes.

[0055] By producing cells with combinations of concatemers comprisingcassettes with expressible nucleotide sequences from a number ofdifferent expression states, which may be from any number of expressionstates, from the same or from selected species, from unrelated ordistantly related species, or from species from different kingdoms,novel and random combinations of gene products are produced in onesingle cell. By furthermore having expressible nucleotide sequencesunder the control of a number of independently inducible or repressiblepromoters, a large number of different expression states can be createdinside one single cell by selectively turning on and off groups of theinserted expressible nucleotide sequences. The number of independentlyinducible and/or repressible promoters in one cell may vary from 1 to100, 1 to 50, 1 to 10, or such as up to 15, 20, 25 or above 50promoters.

[0056] By inserting novel genes into the host cell, and especially byinserting a high number of novel genes from a wide variety of speciesinto a host cell, it is highly likely that the gene products from thisarray of novel genes will interact with the pool of metabolites of thehost cell and modify known metabolites and/or intermediates in novelways to create novel compounds. Since the interaction is performed atthe enzyme level it is furthermore likely that the result will be novelcompounds with chiral centres, which are especially difficult tosynthesise via chemical synthesis.

[0057] Novel metabolic pathways may also be made that are not capable offunctioning, due to the absence of a substrate in the host cell. Suchmetabolic pathways may be made active by addition of non-host-cellspecific substrates, which are metabolisable by the novel pathways.

[0058] One special advantage of the cells according to the presentinvention, is that incompatibility barriers between species do not limitthe combinations of genes in one single cell.

[0059] According to a further aspect the invention relates to a methodfor producing a transgenic cell comprising inserting into a host cell aconcatemer comprising a heterologous nucleotide sequence comprising atleast two genes each controlled by a promoter, wherein the two genesand/or the two promoters are different.

[0060] According to a further aspect, the invention concerns a primaryvector comprising a nucleotide sequence cassette of the general formulain 5′→3′ direction:

[RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′]

[0061] wherein

[0062] RS1 and RS1′ denote restriction sites,

[0063] RS2 and RS2′ denote restriction sites different from RS1 andRS1′,

[0064] SP individually denotes a spacer sequence of at least twonucleotides,

[0065] PR denotes a promoter,

[0066] CS denotes a cloning site,

[0067] TR denotes a terminator.

[0068] The cassette within the primary vector is useful for cloning andstoring in a host cell random expressible nucleotide sequences, whichare under the control of the promoter comprised in the cassette. Thecassette may be inserted and removed by using restriction enzymesspecific for either of the four restriction sites, of which RS1 and RS1′are preferably identical and RS2 and RS2′ are also preferably identical.One special advantage of the cassette is that a collection of cassettesmay be assembled into a concatemer of cassettes according to theinvention by excising the cassettes from the primary vector, e.g.through use of the restriction enzyme(s) specific for RS2 and RS2′, andconcatenation of a population of random cassettes in a solution. Theeasiest concatenation is obtained when RS1 and RS1′ leave blunt ends andRS2 and RS2′ leave sticky ends. In this way it can be avoided that theempty vector takes part in the concatenation. Furthermore, it has beenobserved that the small fragments containing RS1-RS2 and RS1′-RS2′ donot have to be removed, since they do not interfere with theconcatenation. If desired, the small fragments can easily be removedusing e.g. precipitation or filtration.

[0069] The primary vector according to the invention is especiallyadapted for expression with cDNA into it, because it is equipped with apromoter. Integration of genomic DNA is also possible however this maycause interaction between the native promoter sequences of the insertedgenomic DNA and the external control obtainable through the promoter ofthe vector.

[0070] Preferably, PR is not functional in the host of the primaryvector but only functional in the expression host, into which theconcatemers are going to be inserted for expression. This is to avoidselection against genes which are lethal to the library host, in whichthe primary vector is stored and/or amplified.

[0071] The spacer sequence is inserted in the cassette in order toincrease stability of the concatemers after concatenation. The cassettesare built so that they can be joined head to tail, head to head or tailto tail after concatenation. A concatemer of two cassettes can thus havethe following structure: 3′rs₂-SP-PR-X-TR-SP-rs₁-rs₂-SP-PR-X-TR-SP-rs₁5′3′rs₂-SP-TR-X-PR-SP-rs₁-rs₂-SP-PR-X-TR-SP-rs₁5′3′rs₂-SP-PR-X-TR-SP-rs₁-rs₂-SP-TR-X-PR-SP-rs₁5′

[0072] (rs₁-rs₂ together denote a restriction site)

[0073] The presence of the spacer reduces the risk of hairpin formationbetween two adjacent terminator or promoter sequences, which may beidentical. As a consequence, the presence of the spacer is also intendedto increase the stability of the concatemers in the expression host intowhich they are inserted.

[0074] Using enzymes that leave non-palindromic overhangs it is possibleto generate concatemers with essentially head to tail orientation.

[0075] In another aspect the invention relates to a method of preparinga primary vector comprising

[0076] inserting an expressible nucleotide sequence into a cloning sitein a primary vector comprising a cassette, the cassette comprising anucleotide sequence of the general formula in 5′→3′ direction:

[RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′]

[0077] wherein

[0078] RS1 and RS1′ denote restriction sites,

[0079] RS2 and RS2′ denote restriction sites different from RS1 andRS1′,

[0080] SP individually denotes a spacer sequence of at least twonucleotides,

[0081] PR denotes a promoter,

[0082] CS denotes a cloning site, and

[0083] TR denotes a terminator.

[0084] In a further aspect the invention relates to a nucleotide librarycomprising at least two primary vectors each vector comprising anucleotide sequence cassette of the general formula in 5′→3′ direction:

[RS1-RS2-SP—PR—X-TR—SP—RS2′-RS1′]

[0085] wherein

[0086] RS1 and RS¹′ denote restriction sites,

[0087] RS2 and RS2′ denote restriction sites different from RS1 andRS1′,

[0088] SP individually denotes a spacer sequence of at least twonucleotides,

[0089] PR denotes a promoter,

[0090] X denotes an expressible nucleotide sequence,

[0091] TR denotes a terminator.

[0092] wherein the expressible nucleotide sequences are isolated fromone expression state, and

[0093] wherein at least two cassettes are different.

[0094] The nucleotide library may also be referred to as an entry(=initial) library, the intention being to use the nucleotide library asa suitable means for storing and amplifying a high number of vectorscomprising the nucleotide cassettes according to the invention. It isalso intended to excise the cassettes after amplification in order touse the excised cassettes for concatenation. Conveniently one nucleotidelibrary may cover expressible nucleotide sequences from the same sourcepool, such as from the same expression state. Therefore, the library isconveniently used for introducing cDNA synthesised from mRNA isolatedfrom one expression state.

[0095] Preferably, the PR sequences are not capable of functioning inthe library host cells. This is to ensure that none of the expressiblenucleotide sequences are lethal to the library host cell and thereforemay be lost.

[0096] The nucleotide library furthermore provides a suitable means forlater assembly of concatemers of cassettes stored in the library.According to an especially preferred embodiment of the invention,substantially all cassettes in the library are different. Thisdifference is partly introduced to be able to coordinately expressdifferent subset of genes and partly to minimise the level of repeatsequences occurring in the concatemers.

[0097] In a still further aspect the invention relates to a method forpreparing a nucleotide library comprising obtaining expressiblenucleotide sequences, cloning the expressible nucleotide sequences intocloning sites of a mixture of primary vectors, the primary vectorscomprising a cassette, the cassettes comprising a nucleotide sequence ofthe general formula in 5′→3′ direction:

[RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′]

[0098] wherein

[0099] RS1 and RS1′ denote restriction sites,

[0100] RS2 and RS2′ denote restriction sites different from RS1 andRS1′,

[0101] SP individually denotes a spacer sequence of at least twonucleotides,

[0102] PR denotes a promoter,

[0103] CS denotes a cloning site, and

[0104] TR denotes a terminator,

[0105] and transferring the primary vectors into a host cell obtaining alibrary.

[0106] Conveniently the method may comprise building a cDNA library frommRNA isolated from one expression state or starting from a cDNA libraryand cloning the cDNA sequences into a mixture of primary vectorsaccording to the invention. In order to have the library andsub-libraries organised in a proper manner, each library comprisesexpressible nucleotide sequences representative of a given expressionstate.

[0107] For all the concatemers and libraries discussed herein thespacers (SP, promotors (PR), and terminators (TR) may be identical inall cassettes, but in preferred embodiments the spacers (SP, promotors(PR), and terminators (TR) are different for at least a part of thecassettes in a concatemer and a library.

BRIEF DESCRIPTION OF THE DRAWINGS

[0108]FIG. 1 shows a flow chart of the steps leading from an expressionstate to incorporation of the expressible nucleotide sequences in anentry library (a nucleotide library according to the invention).

[0109]FIG. 2 shows a flow chart of the steps leading from an entrylibrary comprising expressible nucleotide sequences to evolvableartificial chromosomes (EVAC) transformed into an appropriate host cell.

[0110]FIG. 2a shows one way of producing the EVACs which includesconcatenation, size selection and insertion into an artificialchromosome vector.

[0111]FIG. 2b shows a one step procedure for concatenation and ligationof vector arms to obtain EVACs.

[0112]FIG. 3 shows a model entry vector. MCS is a multi cloning site forinserting expressible nucleotide sequences. Amp R is the gene forampicillin resistance. Col E is the origin of replication in E. coli. R1and R2 are restriction enzyme recognition sites.

[0113]FIG. 4 shows an example of an entry vector according to theinvention, EVE4. MET25 is a promoter, ADH1 is a terminator, f1 is anorigin of replication for filamentous phages, e.g. M13. Spacer 1 andspacer 2 are constituted by a few nucleotides deriving from the multiplecloning site, MCS, Srfl and Ascl are restriction enzyme recognitionsites. Other abbreviations, see FIG. 3. The sequence of the vector isset forth in SEQ ID NO 1.

[0114]FIG. 5 shows an example of an entry vector according to theinvention, EVE5. CUP1 is a promoter, ADH1 is a terminator, f1 is anorigin of replication for filamentous phages, e.g. M13. Spacer 1 andspacer 2 are constituted by a few nucleotides deriving from the multiplecloning site, MCS, Srfl and Ascl are restriction enzyme recognitionsites. Other abbreviations, see FIG. 3. The sequence of the vector isset forth in SEQ ID NO 2.

[0115]FIG. 6 shows an example of an entry vector according to theinvention, EVE8. CUP1 is a promoter, ADH1 is a terminator, f1 is anorigin of replication for filamentous phages, e.g. M13. Spacer 3 is a550 bp fragment of lambda phage DNA fragment. Spacer 4 is a ARS1sequence from yeast. Srfl and Ascl are restriction enzyme recognitionsites. Other abbreviations, see FIG. 3. The sequence of the vector isset forth in SEQ ID NO 3.

[0116]FIG. 7 shows a vector (pYAC4-Ascl) for providing arms for anevolvable artificial chromosome (EVAC) into which a concatemer accordingto the invention can be cloned. TRP1, URA3, and HIS3 are yeastauxotrophic marker genes, and AmpR is an E. coli antibiotic marker gene.CEN4 is a centromere and TEL are telomeres. ARS1 and PMB1 allowreplication in yeast and E. coli respectively. BamH I and Asc I arerestriction enzyme recognition sites. The nucleotide sequence of thevector is set forth in SEQ ID NO 4.

[0117]FIG. 8. shows the general concatenation strategy. On the left isshown a circular entry vector with restriction sites, spacers, promoter,expressible nucleotide sequence and terminator. These are excised andligated randomly. Lane F/Y 1 100/1  2 50/1  3 20/1  4 10/1  5 5/1 6 2/17 1/1 8 1/2 9 1/5

[0118] Legend: Lane M: molecular weight marker, λ-phage DNA digestedw.Pst1. Lanes 1-9, concatenation reactions. Ratio of fragments toyac-arms(F/Y) as in table.

[0119]FIGS. 9a and 9 b. illustrates the integration of concatenationwith synthesis of evolvable artificial chromosomes and how concatemersize can be controlled by controlling the ratio of vector arms toexpression cassettes, as described in example 7.

[0120]FIG. 10. Library of EVAC transformed population shown under 4different growth conditions. Coloured phenotypes can be readily detectedupon induction of the Met25 and/or the Capl promoters.

[0121]FIG. 11. EVAC gel Legend: PFGE of EVAC containing clones: Lanes.a: Yeast DNA PFGE markers(strain YNN295), b: lambda ladder, c:non-transformed host yeast, 1-9: EVAC containing clones. EVACs in sizerange 1400-1600 kb. Lane 2 shows a clone containing 2 EVACs sized ˜1500kb and ˜550 kb respectively. The 550 kb EVAC is comigrating with the 564kb yeast chromosome and is resulting in an increased intensity of theband at 564 kb relative to the other bands in the lane. Arrows point upto EVAC bands.

DEFINITIONS

[0122] Oligonucleotides

[0123] Any fragment of nucleic acids having approximately from 2 to10000 nucleic acids.

[0124] Restriction Site

[0125] For the purposes of the present invention the abbreviation RSn(n=1,2,3, etc) is used to designate a nucleotide sequence comprising arestriction site. A restriction site is defined by a recognitionsequence and a cleavage site. The cleavage site may be located within oroutside the recognition sequence. The abbreviation “rs₁” or “rs₂” isused to designate the two ends of a restriction site after cleavage. Thesequence “rs₁-rs₂” together designate a complete restriction site.

[0126] The cleavage site of a restriction site may leave a doublestranded polynucleotide sequence with either blunt or sticky ends. Thus,“rs₁” or “rs₂” may designate either a blunt or a sticky end.

[0127] In the notation used throughout the present invention, formulaelike:

RS1-RS2-SP—PR—X-TR—SP—RS2-RS1

[0128] should be interpreted to mean that the individual sequencesfollow in the order specified. This does not exclude that part of therecognition sequence of e.g. RS2 overlap with the spacer sequence, butit is a strict requirement that all the items except RS1 and RS1′ arefunctional and remain functional after cleavage and re-assemblage.Furthermore the formulae do not exclude the possibility of havingadditional sequences inserted between the listed items. For exampleintrons can be inserted as described in the invention below and furtherspacer sequences can be inserted between RS1 and RS2 and between TR andRS2. Important is that the sequences remain functional.

[0129] Furthermore, when reference is made to the size of therestriction site and/or to specific bases within it, only the bases inthe recognition sequence are referred to.

[0130] Expression State

[0131] An expression state is a state in any specific tissue of anyindividual organism at any one time. Any change in conditions leading tochanges in gene expression leads to another expression state. Differentexpression states are found in different individuals, in differentspecies but they may also be found in different organs in the samespecies or individual, and in different tissue types in the same speciesor individual. Different expression states may also be obtained in thesame organ or tissue in any one species or individual by exposing thetissues or organs to different environmental conditions comprising butnot limited to changes in age, disease, infection, drought, humidity,salinity, exposure to xenobiotics, physiological effectors, temperature,pressure, pH, light, gaseous environment, chemicals such as toxins.

[0132] Artificial Chromosome

[0133] As used herein, an artificial chromosome (AC) is a piece of DNAthat can stably replicate and segregate alongside endogenouschromosomes. For eukaryotes the artificial chromosome may also bedescribed as a nucleotide sequence of substantial length comprising afunctional centromer, functional telomeres, and at least one autonomousreplicating sequence. It has the capacity to accommodate and expressheterologous genes inserted therein. It is referred to as a mammalianartificial chromosome (MAC) when it contains an active mammaliancentromere. Plant artificial chromosome and insect artificial chromosome(BUGAC) refer to chromosomes that include plant and insect centromers,respectively. A human artificial chromosome (HAC) refers to a chromosomethat includes human centromeres, AVACs refer to avian artificialchromosomes. A yeast artificial chromosome (YAC) refers to chromosomesare functional in yeast, such as chromosomes that include a yeastcentromere.

[0134] As used herein, stable maintenance of chromosomes occurs when atleast about 85%, preferably 90%, more preferably 95% of the cells retainthe chromosome. Stability is measured in the presence of a selectiveagent. Preferably these chromosomes are also maintained in the absenceof a selective agent. Stable chromosomes also retain their structureduring cell culturing, suffering neither intrachromosomal norinterchromosomal rearrangements.

DETAILED DESCRIPTION OF THE INVENTION

[0135] In the following the invention is described in the order in whichthe steps of obtaining a transformed host cell containing an evolvableartificial chromosome may be performed, starting with the entry vector.

[0136] Origin of Expressible Nucleotide Sequences

[0137] The expressible nucleotide sequences that can be inserted intothe vectors, concatemers, and cells according to this inventionencompass any type of nucleotide such as RNA, DNA. Such a nucleotidesequence could be obtained e.g. from cDNA, which by its nature isexpressible. But it is also possible to use sequences of genomic DNA,coding for specific genes. Preferably, the expressible nucleotidesequences correspond to full length genes such as substantially fulllength cDNA, but nucleotide sequences coding for shorter peptides thanthe original full length mRNAs may also be used. Shorter peptides maystill retain the catalytic activity similar to that of the nativeproteins.

[0138] Another way to obtain expressible nucleotide sequences is throughchemical synthesis of nucleotide sequences coding for known peptide orprotein sequences. Thus the expressible DNA sequences does not have tobe a naturally occurring sequence, although it may be preferable forpractical purposes to primarily use naturally occurring nucleotidesequences. Whether the DNA is single or double stranded will depend onthe vector system used.

[0139] In most cases the orientation with respect to the promoter of anexpressible nucleotide sequence will be such that the coding strand istranscribed into a proper mRNA. It is however conceivable that thesequence may be reversed generating an antisense transcript in order toblock expression of a specific gene.

[0140] Cassettes

[0141] An important aspect of the invention concerns a cassette ofnucleotides in a highly ordered sequence, the cassette having thegeneral formula in 5′→3′ direction:

[RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′]

[0142] wherein RS1 and RS1′ denote restriction sites, RS2 and RS2′denote restriction sites different from RS1 and RS1′, SP individuallydenotes a spacer sequence of at least two nucleotides, PR denotes apromoter, CS denotes a cloning site, and TR denotes a terminator.

[0143] It is an advantage to have two different restriction sitesflanking both sides of the expression construct. By treating the primaryvectors with restriction enzymes cleaving both restriction sites, theexpression construct and the primary vector will be left with twonon-compatible ends. This facilitates a concatenation process, since theempty vectors do not participate in the concatenation of expressionconstructs.

[0144] Restriction Sites

[0145] In principle, any restriction site, for which a restrictionenzyme is known can be used. These include the restriction enzymesgenerally known and used in the field of molecular biology such as thosedescribed in Sambrook, Fritsch, Maniatis, “A laboratory Manual”, 2^(nd)edition. Cold Spring Harbor Laboratory Press, 1989.

[0146] The restriction site recognition sequences preferably are of asubstantial length, so that the likelihood of occurrence of an identicalrestriction site within the cloned oligonucleotide is minimised. Thusthe first restriction site may comprise at least 6 bases, but morepreferably the recognition sequence comprises at least 7 or 8 bases.Restriction sites having 7 or more non N bases in the recognitionsequence are generally known as “rare restriction sites” (see example6). However, the recognition sequence may also be at least 10 bases,such as at least 15 bases, for example at least 16 bases, such as atleast 17 bases, for example at least 18 bases, such as at least 18bases, for example at least 19 bases, for example at least 20 bases,such as at least 21 bases, for example at least 22 bases, such as atleast 23 bases, for example at least 25 bases, such as at least 30bases, for example at least 35 bases, such as at least 40 bases, forexample at least 45 bases, such as at least 50 bases.

[0147] Preferably the first restriction site RS1 and RS1′ is recognisedby a restriction enzyme generating blunt ends of the double strandednucleotide sequences. By generating blunt ends at this site, the riskthat the vector participates in a subsequent concatenation is greatlyreduced. The first restriction site may also give rise to sticky ends,but these are then preferably non-compatible with the sticky endsresulting from the second restriction site, RS2 and RS2′ and with thesticky ends in the AC.

[0148] According to a preferred embodiment of the invention, the secondrestriction site, RS2 and RS2′ comprises a rare restriction site. Thus,the longer the recognition sequence of the rare restriction site themore rare it is and the less likely is it that the restriction enzymerecognising it will cleave the nucleotide sequence atother—undesired—positions.

[0149] The rare restriction site may furthermore serve as a PCR primingsite. Thereby it is possible to copy the cassettes via PCR techniquesand thus indirectly “excise” the cassettes from a vector.

[0150] Spacer Sequence

[0151] The spacer sequence located between the RS2 and the PR sequenceis preferably a non-transcribed spacer sequence. The purpose of thespacer sequence(s) is to minimise recombination between differentconcatemers present in the same cell or between cassettes present in thesame concatemer, but it may also serve the purpose of making thenucleotide sequences in the cassettes more “host” like. A furtherpurpose of the spacer sequence is to reduce the occurrence of hairpinformation between adjacent palindromic sequences, which may occur whencassettes are assembled head to head or tail to tail. Spacer sequencesmay also be convenient for introducing short conserved nucleotidesequences that may serve e.g. as PCR primer sites or as target forhybridization to e.g. nucleic acid or PNA or LNA probes allowingaffinity purification of cassettes.

[0152] The cassette may also optionally comprise another spacer sequenceof at least two nucleotides between TR and RS2. When cassettes are cutout from a vector and concatenated into concatemers of cassettes, thespacer sequences together ensure that there is a certain distancebetween two successive identical promoter and/or terminator sequences.This distance may comprise at least 50 bases, such as at least 60 bases,for example at least 75 bases, such as at least 100 bases, for exampleat least 150 bases, such as at least 200 bases, for example at least 250bases, such as at least 300 bases, for example at least 400 bases, forexample at least 500 bases, such as at least 750 bases, for example atleast 1000 bases, such as at least 1100 bases, for example at least 1200bases, such as at least 1300 bases, for example at least 1400 bases,such as at least 1500 bases, for example at least 1600 bases, such as atleast 1700 bases, for example at least 1800 bases, such as at least 1900bases, for example at least 2000 bases, such as at least 2100 bases, forexample at least 2200 bases, such as at least 2300 bases, for example atleast 2400 bases, such as at least 2500 bases, for example at least 2600bases, such as at least 2700 bases, for example at least 2800 bases,such as at least 2900 bases, for example at least 3000 bases, such as atleast 3200 bases, for example at least 3500 bases, such as at least 3800bases, for example at least 4000 bases, such as at least 4500 bases, forexample at least 5000 bases, such as at least 6000 bases.

[0153] The number of the nucleotides between the spacer located 5′ tothe PR sequence and the one located 3′ to the TR sequence may be any.However, it may be advantageous to ensure that at least one of thespacer sequences comprises between 100 and 2500 bases, preferablybetween 200 and 2300 bases, more preferably between 300 and 2100 bases,such as between 400 and 1900 bases, more preferably between 500 and 1700bases, such as between 600 and 1500 bases, more preferably between 700and 1400 bases.

[0154] If the intended host cell is yeast, the spacers present in aconcatemer should perferably comprise a combination of a few ARSes withvarying lambda phage DNA fragments.

[0155] Preferred examples of spacer sequences include but are notlimited to: Lamda phage DNA, prokaryotic genomic DNA such as E. coligenomic DNA, ARSes.

[0156] Promoter

[0157] A promoter is a DNA sequence to which RNA polymerase binds andinitiates transcription. The promoter determines the polarity of thetranscript by specifying which strand will be transcribed.

[0158] Bacterial promoters normally consist of −35 and −10 (relative tothe transcriptional start) consensus sequences which are bound by aspecific sigma factor and RNA polymerase.

[0159] Eukaryotic promoters are more complex. Most promoters utilized inexpression vectors are transcribed by RNA polymerase II. Generaltranscription factors (GTFs) first bind specific sequences near thetranscriptional start and then recruit the binding of RNA polymerase II.In addition to these minimal promoter elements, small sequence elementsare recognized specifically by modular DNA-binding/trans-activatingproteins (e.g. AP-1, SP-1) which regulate the activity of a givenpromoter.

[0160] Viral promoters may serve the same function as bacterial andeukaryotic promoters. Upon viral infection of their host, viralpromoters direct transcription either by using host transcriptionalmachinery or by supplying virally encoded enzymes to substitute part ofthe host machinery. Viral promoters are recognised by thetranscriptional machinery of a large number of host organisms and aretherefore often used in cloning and expression vectors.

[0161] Promoters may furthermore comprise regulatory elements, which areDNA sequence elements which act in conjunction with promoters and bindeither repressors (e.g., lacO/LAC Iq repressor system in E. coli) orinducers (e.g., gal1/GAL4 inducer system in yeast). In either case,transcription is virtually “shut off” until the promoter is derepressedor induced, at which point transcription is “turned-on”. The choice ofpromoter in the cassette is primarily dependent on the host organisminto which the cassette is intended to be inserted. An importantrequirement to this end is that the promoter should preferably becapable of functioning in the host cell, in which the expressiblenucleotide sequence is to be expressed.

[0162] Preferably the promoter is an externally controllable promoter,such as an inducible promoter and/or a repressible promoter. Thepromoter may be either controllable (repressible/inducible) by chemicalssuch as the absence/presence of chemical inducers, e.g. metabolites,substrates, metals, hormones, sugars. The promoter may likewise becontrollable by certain physical parameters such as temperature, pH,redox status, growth stage, developmental stage, or the promoter may beinducible/repressible by a synthetic inducer/repressor such as the galinducer.

[0163] In order to avoid unintentional interference with the generegulation systems of the host cell, and in order to improvecontrollability of the coordinated gene expression the promoter ispreferably a synthetic promoter. Suitable promoters are described inU.S. Pat. No. 5,798,227, U.S. Pat. No. 5,667,986. Principles fordesigning suitable synthetic eukaryotic promoters are disclosed in U.S.Pat. No. 5,559,027, U.S. Pat. No. 5,877,018 or U.S. Pat. No. 6,072,050.

[0164] Synthetic inducible eukaryotic promoters for the regulation oftranscription of a gene may achieve improved levels of proteinexpression and lower basal levels of gene expression. Such promoterspreferably contain at least two different classes of regulatoryelements, usually by modification of a native promoter containing one ofthe inducible elements by inserting the other of the inducible elements.For example, additional metal responsive elements IR:Es) and/orglucocorticoid responsive elements (GREs) may be provided to nativepromoters. Additionally, one or more constitutive elements may befunctionally disabled to provide the lower basal levels of geneexpression.

[0165] Preferred examples of promoters include but is not limited tothose promoters being induced and/or repressed by any factor selectedfrom the group comprising carbohydrates, e.g. galactose; low inorganicphosphase levels; temperature, e.g. low or high temperature shift;metals or metal ions, e.g. copper ions; hormones, e.g.dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39° C.);methanol; redox-status; growth stage, e.g. developmental stage;synthetic inducers, e.g. gal inducer. Examples of such promoters includeADH 1, PGK 1, GAP 491, TPI, PYK, ENO, PMA 1, PHO5, GAL 1, GAL 2, GAL 10,MET25, ADH2, MEL 1, CUP 1, HSE, AOX, MOX, SV40, CaMV, Opaque-2, GRE,ARE, PGK/ARE hybrid, CYC/GRE hybrid, TPI/α2 operator, AOX 1, MOX A.

[0166] More preferably, however the promoter is selected from hybridpromoters such as PGK/ARE hybrid, CYC/GRE hybrid or from syntheticpromoters. Such promoters can be controlled without interfering too muchwith the regulation of native genes in the expression host.

[0167] Yeast Promoters

[0168] In the following, examples of known yeast promoters that may beused in conjunction with the present invention are shown. The examplesare by no way limiting and only serve to indicate to the skilledpractitioner how to select or design promoters that are useful accordingto the present invention.

[0169] Although numerous transcriptional promoters which are functionalin yeasts have been described in the literature, only some of them haveproved effective for the production of polypeptides by the recombinantroute. There may be mentioned in particular the promoters of the PGKgenes (3-phosphoglycerate kinase, TDH genes encoding GAPDH(Glyceraldehyde phosphate dehydrogenase), TEF1 genes (Elongation factor1), MFα1 (α sex pheromone precursor) which are considered as strongconstitutive promoters or alternatively the regulatable promoter CYCIwhich is repressed in the presence of glucose or PHO5 which can beregulated by thiamine. However, for reasons which are often unexplained,they do not always allow the effective expression of the genes whichthey control. In this context, it is always advantageous to be able tohave new promoters in order to generate new effective host/vectorsystems. Furthermore, having a choice of effective promoters in a givencell also makes it possible to envisage the production of multipleproteins in this same cell (for example several enzymes of the samemetabolic chain) while avoiding the problems of recombination betweenhomologous sequences.

[0170] In general, a promoter region is situated in the 5′ region of thegenes and comprises all the elements allowing the transcription of a DNAfragment placed under their control, in particular:

[0171] (1) a so-called minimal promoter region comprising the TATA boxand the site of initiation of transcription, which determines theposition of the site of initiation as well as the basal level oftranscription. In Saccharomyces cerevisiae, the length of the minimalpromoter region is relatively variable. Indeed, the exact location ofthe TATA box varies from one gene to another and may be situated from−40 to −120 nucleotides upstream of the site of the initiation (Chen andStruhl, 1985, EMBO J., 4, 3273-3280)

[0172] (2) sequences situated upstream of the TATA box (immediatelyupstream up to several hundreds of nucleotides) which make it possibleto ensure an effective level of transcription either constitutively(relatively constant level of transcription all along the cell cycle,regardless of the conditions of culture) or in a regulatable manner(activation of transcription in the presence of an activator and/orrepression in the presence of a repressor). These sequences, may be ofseveral types: activator, inhibitor, enhancer, inducer, repressor andmay respond to cellular factors or varied culture conditions.

[0173] Examples of such promoters are the ZZA1 and ZZA2 promotersdisclosed in U.S. Pat. No. 5,641,661, the EF1-α protein promoter and theribosomal protein S7 gene promoter disclosed in WO 97/44470, the COX 4promoter and two unknown promoters (SEQ ID No: 1 and 2 in the document)disclosed in U.S. Pat. No. 5,952,195. Other useful promoters include theHSP150 promoter disclosed in WO 98/54339 and the SV40 and RSV promotersdisclosed in U.S. Pat. No. 4,870,013 as well as the PyK and GAPDHpromoters disclosed in EP 0 329 203 A1.

[0174] Synthetic Yeast Promoters

[0175] More preferably the invention employs the use of syntheticpromoters. Synthetic promoters are often constructed by combining theminimal promoter region of one gene with the upstream regulatingsequences of another gene. Enhanced promoter control may be obtained bymodifying specific sequences in the upstream regulating sequences, e.g.through substitution or deletion or through inserting multiple copies ofspecific regulating sequences. One advantage of using syntheticpromoters is that they may be controlled without interfering too muchwith the native promoters of the host cell.

[0176] One such synthetic yeast promoter comprises promoters or promoterelements of two different yeast-derived genes, yeast killer toxin leaderpeptide, and amino terminus of IL-1β (WO 98/54339).

[0177] Another example of a yeast synthetic promoter is disclosed inU.S. Pat. No. 5,436,136 (Hinnen et al), which concerns a yeast hybridpromoter including a 5′ upstream promoter element comprising upstreamactivation site(s) of the yeast PHO5 gene and a 3′ downstream promoterelement of the yeast GAPDH gene starting at nucleotide −300 to −180 andending at nucleotide −1 of the GAPDH gene.

[0178] Another example of a yeast synthetic promoter is disclosed inU.S. Pat. No. 5,089,398 (Rosenberg et al). This disclosure describes apromoter with the general formula—

(P.R.(2)−P.R.(1))—

[0179] wherein:

[0180] P.R.(1) is the promoter region proximal to the coding sequenceand having the transcription initiation site, the RNA polymerase bindingsite, and including the TATA box, the CAAT sequence, as well astranslational regulatory signals, e.g., capping sequence, asappropriate;

[0181] P.R.(2) is the promoter region joined to the 5′-end of P.R.(1)associated with enhancing the efficiency of transcription of the RNApolymerase binding region;

[0182] In U.S. Pat. No. 4,945,046 (Horii et al) discloses a furtherexample of how to design a synthetic yeast promoter. This specificpromoter comprises promoter elements derived both from yeast and from amammal. The hybrid promoter consists essentially of Saccharomycescerevisiae PHO5 or GAP-DH promoter from which the upstream activationsite (UAS) has been deleted and replaced by the early enhancer regionderived from SV40 virus.

[0183] Cloning Site

[0184] The cloning site in the cassette in the primary vector should bedesigned so that any nucleotide sequence can be cloned into it.

[0185] The cloning site in the cassette preferably allows directionalcloning. Hereby is ensured that transcription in a host cell isperformed from the coding strand in the intended direction and that thetranslated peptide is identical to the peptide for which the originalnucleotide sequence codes.

[0186] However according to some embodiments it may be advantageous toinsert the sequence in opposite direction. According to theseembodiments, so-called antisense constructs may be inserted whichprevent functional expression of specific genes involved in specificpathways. Thereby it may become possible to divert metabolicintermediates from a prevalent pathway to another less dominant pathway.

[0187] The cloning site in the cassette may comprise multiple cloningsites, generally known as MCS or polylinker sites, which is a syntheticDNA sequence encoding a series of restriction endonuclease recognitionsites. These sites are engineered for convenient cloning of DNA into avector at a specific position and for directional cloning of the insert.

[0188] Cloning of cDNA does not have to involve the use of restrictionenzymes. Other alternative systems include but are not limited to:

[0189] Creator™ Cre-loxP system from Clontech, which uses recombinationand loxP sites

[0190] use of Lambda attachment sites (att-λ), such as the Gateway™system from Life Technologies.

[0191] Both of these systems are directional.

[0192] Terminator

[0193] The role of the terminator sequence is to limit transcription tothe length of the coding sequence. An optimal terminator sequence isthus one, which is capable of performing this act in the host cell.

[0194] In prokaryotes, sequences known as transcriptional terminatorssignal the RNA polymerase to release the DNA template and stoptranscription of the nascent RNA.

[0195] In eukaryotes, RNA molecules are transcribed well beyond the endof the mature mRNA molecule. New transcripts are enzymatically cleavedand modified by the addition of a long sequence of adenylic acidresidues known as the poly-A tail. A polyadenylation consensus sequenceis located about 10 to 30 bases upstream from the actual cleavage site.

[0196] Preferred examples of yeast derived terminator sequences include,but are not limited to: ADN1, CYC1, GPD, ADH1 alcohol dehydrogenase.

[0197] Intron

[0198] Optionally, the cassette in the vector comprises an intronsequence, which may be located 5′ or 3′ to the expressible nucleotidesequence. The design and layout of introns is well known in the art. Thechoice of intron design largely depends on the intended host cell, inwhich the expressible nucleotide sequence is eventually to be expressed.The effects of having intron sequence in the expression cassettes arethose generally associated with intron sequences.

[0199] Examples of yeast introns can be found in the literature and inspecific databases such as Ares Lab Yeast Intron Database (Version 2.1)as updated on 15 Apr. 2000. Earlier versions of the database as well asextracts of the database have been published in: “Genome-widebioinformatic and molecular analysis of introns in Saccharomycescerevisiae.” by Spingola M, Grate L, Haussler D, Ares M Jr. (RNAFebruary 1999;5(2):221-34) and “Test of intron predictions reveals novelsplice sites, alternatively spliced mRNAs and new introns in meioticallyregulated genes of yeast.” by Davis C A, Grate L, Spingola M, Ares M Jr,(Nucleic Acids Res Apr. 15, 2000;28(8):1700-6).

[0200] Primary Vectors (Entry Vectors)

[0201] By the term entry vector is meant a vector for storing andamplifying cDNA or other expressible nucleotide sequences using thecassettes according to the present invention. The primary vectors arepreferably able to propagate in E. coli or any other suitable standardhost cell. It should preferably be amplifiable and amenable to standardnormalisation and enrichment procedures.

[0202] The primary vector may be of any type of DNA that has the basicrequirements of a) being able to replicate itself in at least onesuitable host organism and b) allows insertion of foreign DNA which isthen replicated together with the vector and c) preferably allowsselection of vector molecules that contain insertions of said foreignDNA. In a preferred embodiment the vector is able to replicate instandard hosts like yeasts, and bacteria and it should preferably have ahigh copy number per host cell. It is also preferred that the vector inaddition to a host specific origin of replication, contains an origin ofreplication for a single stranded virus, such as e.g. the f1 origin forfilamentous phages. This will allow the production of single strandednucleic acid which may be useful for normalisation and enrichmentprocedures of cloned sequences. A vast number of cloning vectors havebeen described which are commonly used and references may be given toe.g. Sambrook, J; Fritsch, E. F; and Maniatis T. (1989) MolecularCloning: A laboratory manual. Cold Spring Harbour Laboratory Press, USA,Netherlands Culture Collection of Bacteria(www.cbs.knaw.nl/NCCB/collection.htm) or Department of MicrobialGenetics, National Institute of Genetics, Yata 1111 Mishima Shizuoka411-8540, Japan (www.shigen.nig.ac.jp/cvector/cvector.html). A fewtype-examples that are the parents of many popular derivatives areM13mp10, pUC18, Lambda. gt 10, and pYAC4. Examples of primary vectorsinclude but are not limited to M13K07, pBR322, pUC18, pUC19, pUC118,pUC119, pSP64, pSP65, pGEM-3, pGEM-3Z, pGEM-3Zf(−), pGEM4, pGEM-4Z,πAN13, pBluescript II, CHARON 4A, λ⁺, CHARON 21A, CHARON 32, CHARON 33,CHARON 34, CHARON 35, CHARON 40, EMBL3A, λ2001, λDASH, λFIX, λgt10,λgt11, λgt18, λgt20, λgt22, λORF8, λZAP/R, pJB8, c2RB, pcos1EMBL

[0203] Methods for cloning of cDNA or genomic DNA into a vector are wellknown in the art. Reference may be given to J. Sambrook, E. F. Fritsch,T. Maniatis: Molecular Cloning, A Laboratory Manual (2^(nd) edition,Cold Spring Harbor Laboratory Press, 1989).

[0204] One example of a circular model entry vector is described in FIG.3. The vector, EVE contains the expression cassette,R1-R2-Spacer-Promoter-Multi Cloning Site-Terminator-Spacer-R2-R1. Thevector furthermore contains a gene for ampicillin resistance, AmpR, andan origin of replication for E.coli, ColE1.

[0205] The entry vectors EVE4, EVE5, and EVE8 shown in FIGS. 4, 5, and6. These all contain Srfl as R1 and Ascl as R2. Both of these sites arepalindromic and are regarded as rare restriction sites having 8 bases inthe recognition sequence. The vectors furthermore contain the AmpRampicillin resistance gene, and the ColE1 origin or replication forE.coli as well as f1, which is an origin of replication for filamentousphages, such as M13. EVE4 (FIG. 4) contains the MET25 promoter and theADH1 terminator. Spacer 1 and spacer 2 are short sequences deriving fromthe multiple cloning site, MCS. EVE5 (FIG. 5) contains the CUP1 promoterand the ADH1 terminator. EVE8 (FIG. 6) contains the CUP1 promoter andthe AD1 terminator. The spacers of EVE8 are a 550 bp lambda phage DNA(spacer 3) and an ARS sequence from yeast (spacer 4).

[0206] Nucleotide Library (Entry Library)

[0207] Methods as well as suitable vectors and host cells forconstructing and maintaining a library of nucleotide sequences in a cellare well known in the art. The primary requirement for the library isthat is should be possible to store and amplify in it a number ofprimary vectors (constructs) according to this invention, the vectors(constructs) comprising expressible nucleotide sequences from at leastone expression state and wherein at least two vectors (constructs) aredifferent.

[0208] One specific example of such a library is the well known andwidely employed cDNA libraries. The advantage of the cDNA library ismainly that it contains only DNA sequences corresponding to transcribedmessenger RNA in a cell. Suitable methods are also present to purify theisolated mRNA or the synthesised cDNA so that only substantiallyfull-length cDNA is cloned into the library.

[0209] Methods for optimisation of the process to yield substantiallyfull length cDNA may comprise size selection, e.g. electrophoresis,chromatography, precipitation or may comprise ways of increasing thelikelihood of getting full length cDNAs, e.g. the SMART™ method(Clonetech) or the CapTrap™ method (Stratagene).

[0210] Preferably the method for making the nucleotide library comprisesobtaining a substantially full length cDNA population comprising anormalised representation of cDNA species. More preferably asubstantially full length cDNA population comprises a normalisedrepresentation of cDNA species characteristic of a given expressionstate.

[0211] Normalisation reduces the redundancy of clones representingabundant mRNA species and increases the relative representation ofclones from rare mRNA species.

[0212] Methods for normalisation of cDNA libraries are well known in theart. Reference may be given to suitable protocols for normalisation suchas those described in U.S. Pat. No. 5,763,239 (DIVERSA) and WO 95/08647and WO 95/11986. and Bonaldo, Lennon, Soares, Genome Research 1996,6:791-806; Ali, Holloway, Taylor, Plant Mol Biol Reporter, 2000,18:123-132.

[0213] Enrichment methods are used to isolate clones representing mRNAwhich are characteristic of a particular expression state. A number ofvariations of the method broadly termed as subtractive hybrisation areknown in the art. Reference may be given to Sive, John, Nucleic AcidRes, 1988, 16:10937; Diatchenko, Lau, Campbell et al, PNAS, 1996,93:6025-6030; Caminci, Shibata, Hayatsu, Genome Res, 2000, 10:1617-30,Bonaldo, Lennon, Soares, Genome Research 1996, 6:791-806; Ali, Holloway,Taylor, Plant Mol Biol Reporter, 2000, 18:123-132. For example,enrichment may be achieved by doing additional rounds of hybridizationsimilar to normalization procedures, using e.g. cDNA from a library ofabundant clones or simply a library representing the uninduced state asa driver against a tester library from the induced state. AlternativelymRNA or PCR amplified cDNA derived from the expression state of choicecan be used to subtract common sequences from a tester library. Thechoice of driver and tester population will depend on the nature oftarget expressible nucleotide sequences in each particular experiment.

[0214] In the library an expressible nucleotide sequence coding for onepeptide is preferably found in different but similar vectors under thecontrol of different promoters. Preferably the library comprises atleast three primary vectors with an expressible nucleotide sequencecoding for the same peptide under the control of three differentpromoters. More preferably the library comprises at least four primaryvectors with an expressible nucleotide sequence coding for the samepeptide under the control of four different promoters. More preferablythe library comprises at least five primary vectors with an expressiblenucleotide sequence coding for the same peptide under the control offive different promoters, such as comprises at lest six primary vectorswith an expressible nucleotide sequence coding for the same peptideunder the control of six different promoters, for example comprises atleast seven primary vectors with an expressible nucleotide sequencecoding for the same peptide under the control of seven differentpromoters, for example comprises at least eight primary vectors with anexpressible nucleotide sequence coding for the same peptide under thecontrol of eight different promoters, such as comprises at least nineprimary vectors with an expressible nucleotide sequence coding for thesame peptide under the control of nine different promoters, for examplecomprises at least ten primary vectors with an expressible nucleotidesequence coding for the same peptide under the control of ten differentpromoters.

[0215] The expressible nucleotide sequence coding for the same peptidepreferably comprises essentially the same nucleotide sequence, morepreferably the same nucleotide sequence.

[0216] By having a library with what may be termed one gene under thecontrol of a number of different promoters in different vectors, it ispossible to construct from the nucleotide library an array ofcombinations of genes and promoters. Preferably, one library comprises acomplete or substantially complete combination such as a two dimensionalarray of genes and promoters, wherein substantially all genes are foundunder the control of substantially all of a selected number ofpromoters.

[0217] According to another embodiment of the invention the nucleotidelibrary comprises combinations of expressible nucleotide sequencescombined in different vectors with different spacer sequences and/ordifferent intron sequences. Thus any one expressible nucleotide sequencemay be combined in a two, three, four or five dimensional array withdifferent promoters and/or different spacers and/or different intronsand/or different terminators. The two, three, four or five dimensionalarray may be complete or incomplete, since not all combinations willhave to be present.

[0218] The library may suitably be maintained in a host cell comprisingprokaryotic cells or eukaryotic cells. Preferred prokaryotic hostorganisms may include but are not limited to Escherichia coli, Bacillussubtilis, Streptomyces lividans, Streptomyces coelicolor Pseudomonasaeruginosa, Myxococcus xanthus.

[0219] Yeast species such as Saccharomyces cerevisiae (budding yeast),Schizosaccharomyces pombe (fission yeast), Pichia pastoris, andHansenula polymorpha (methylotropic yeasts) may also be used.Filamentous ascomycetes, such as Neurospora crassa and Aspergillusnidulans may also be used. Plant cells such as those derived fromNicotiana and Arabidopsis are preferred. Preferred mammalian host cellsinclude but are not limited to those derived from humans, monkeys androdents, such as chinese hamster ovary (CHO) cells, NIH/3T3, COS, 293,VERO, HeLa etc (see Kriegler M. in “Gene Transfer and Expression: ALaboratory Manual”, New York, Freeman & Co. 1990).

[0220] Concatemers

[0221] A concatemer is a series of linked units. In the present contexta concatemer is used to denote a number of serially linked nucleotidecassettes, wherein at least two of the serially linked nucleotide unitscomprises a cassette having the basic structure

[rs₂-SP—PR—X-TR—SP-rs₁]

[0222] wherein

[0223] rs₁ and rs₂ together denote a restriction site,

[0224] SP individually denotes a spacer of at least two nucleotidebases,

[0225] PR denotes a promoter, capable of functioning in a cell,

[0226] X denotes an expressible nucleotide sequence,

[0227] TR denotes a terminator, and

[0228] SP individually denotes a spacer of at least two nucleotidebases.

[0229] Optionally the cassettes comprise an intron sequence between thepromoter and the expressible nucleotide sequence and/or between theterminator and the expressible sequence.

[0230] The expressible nucleotide sequence in the cassettes of theconcatemer may comprise a DNA sequence selected from the groupcomprising cDNA and genomic DNA.

[0231] According to one aspect of the invention, a concatemer comprisescassettes with expressible nucleotide from different expression states,so that non-naturally occurring combinations or non-native combinationsof expressible nucleotide sequences are obtained. These differentexpression states may represent at least two different tissues, such asat least two organs, such as at least two species, such as at least twogenera. The different species may be from at least two different phylae,such as from at least two different classes, such as from at least twodifferent divisions, more preferably from at least two differentsub-kingdoms, such as from at least two different kingdoms.

[0232] For example, the expressible nucleotide sequences may originatefrom eukaryots such as mammals such as humans, mice or whale, fromreptiles such as snakes crocodiles or turtles, from tunicates such assea squirts, from lepidoptera such as butterflies and moths, fromcoelenterates such as jellyfish, anenomes, or corals, from fish such asbony and cartilaginous fish, from plants such as dicots, e.g. coffee,oak or monocots such as grasses, lilies, and orchids; from lower plantssuch as algae and gingko, from higher fungi such as terrestrial fruitingfungi, from marine actinomycetes. The expressible nucleotide sequencesmay also originate from protozoans such as malaria or trypanosomes, orfrom prokaryotes such as E. coli or archaebacteria. Furthermore, theexpressible nucleotide sequences may originate from one or morepreferably from more expression states from the species and generalisted in the table below. Bacteria Streptomyces, Micromonospora,Norcadia, Actinomadura, Actinoplanes, Streptosporangium, Microbispora,Kitasatosporiam, Azobacterium, Rhizobium, Achromobacterium,Enterobacterium, Brucella, Micrococcus, Lactobacillus, Bacillus (B.t.toxins), Clostridium (toxins), Brevibacterium, Pseudomonas, Aerobacter,Vibrio, Halobacterium, Mycoplasma, Cytophaga, Myxococcus Fungi Amanitamuscaria (fly agaric, ibotenic acid, muscimol), Psilocybe (psilocybin)Physarium, Fuligo, Mucor, Phytophtora, Rhizopus, Aspergillus,Penicillium (penicillin), Coprinus, Phanerochaete, Acremonium(Cephalosporin), Trochoderma, Helminthosporium, Fusarium, Alternaria,Myrothecium, Saccharomyces Algae Digenea simplex (kainic acid,antihelminthic), Laminaria anqustata (laminine, hypotensive) LichensUsnea fasciata (vulpinicacid, antimicrobial; usnic acid, antitumor)Higher Artemisia (artemisinin), Coleus (forskolin), Desmodium (K channelagonist), Plants Catharanthus (Vinca alkaloids), Digitalis (cardiacglycosides), Podophyllum (podophyllotoxin), Taxus (taxol), Cephalotaxus(homoharringtonine), Camptotheca (Camptothecin), Camellia sinensis(Tea), Cannabis indica, Cannabis sativa (Hemp), Erythroxylum coca(Coca), Lophophora williamsii (Peyote Myristica fragrans (Nutmeg),Nicotiana, Papaver somniferum (Opium Poppy), Phalaris arundinacea (Reedcanary grass) Protozoa Ptychodiscus brevis; Dinoflagellates (brevitoxin,cardiovascular) Sponges Microciona prolifera (ectyonin, antimicrobial)Cryptotethya cryta (D-arabino furanosides) Coelenterata Portuguese Man oWar & other jellyfish and medusoid toxins. Corals Pseudoterogoniaspecies (Pseudoteracins, anti-inflammatory), Erythropodium(erythrolides, anti-inflammatory) Aschelminths Nematode secretorycompounds Molluscs Conus toxins, sea slug toxins, cephalapodneurotransmitters, squid inks Annelida Lumbriconereis heteropa(nereistoxin, insecticidal) Arachnids Dolomedes (“fishing spider”venoms) Crustacea Xenobalanus (skin adhesives) Insects Epilachna(mexican bean beetle alkaloids) Spinunculida Bonellia viridis (bonellin,neuroactive) Bryozoans Bugula neritina (bryostatins, anti cancer)Echinoderms Crinoid chemistry Tunicates Trididemnum solidum (didemnin,anti-tumor and anti-viral; Ecteinascidia turbinata ecteinascidins,anti-tumor) Vertebrates Eptatretus stoutii (eptatretin, cardioactive),Trachinus draco (proteinaceous toxins, reduce blood pressure,respiration and reduce heart rate). Dendrobatid frogs (batrachotoxins,pumiliotoxins, histrionicotoxins, and other polyamines); Snake venomtoxins; Orinthorhynohus anatinus (duck-billed platypus venom), modifiedcarotenoids, retinoids and steroids; Avians: histrionicotoxins, modifiedcarotenoids, retinoids and steroids

[0233] According to a preferred embodiment of the invention theconcatemer comprises at least a first cassette and a second cassette,said first cassette being different from said second cassette. Morepreferably, the concatemer comprises cassettes, wherein substantiallyall cassettes are different. The difference between the cassettes mayarise from differences between promoters, and/or expressible nucleotidesequences, and/or spacers, and/or terminators, and/or introns.

[0234] The number of cassettes in a single concatemer is largelydetermined by the host species into which the concatemer is eventuallyto be inserted and the vector through which the insertion is carried outThe concatemer thus may comprise at least 10 cassettes, such as at least15, for example at least 20, such as at least 25, for example at least30, such as from 30 to 60 or more than 60, such as at least 75, forexample at least 100, such as at least 200, for example at least 500,such as at least 750, for example at least 1000, such as at least 1500,for example at least 2000 cassettes.

[0235] Each of the cassettes may be laid out as described above.

[0236] Once the concatemer has been assembled or concatenated it may beligated into a suitable vector. Such a vector may advantageouslycomprise an artificial chromosome. The basic requirements for afunctional artificial chromosome have been described in U.S. Pat. No.4,464,472, the contents of which is hereby incorporated by reference. Anartificial chromosome or a functional minichromosome, as it may also betermed must comprise a DNA sequence capable of replication and stablemitotic maintenance in a host cell comprising a DNA segment coding forcentromere-like activity during mitosis of said host and a DNA sequencecoding for a replication site recognized by said host.

[0237] Suitable artificial chromosomes include a Yeast ArtificialChromosome (YAC) (see e.g. Murray et al, Nature 305:189-193; or U.S.Pat. No. 4,464,472), a mega Yeast Artificial Chromosome (mega YAC), aBacterial Artificial Chromosome (BAC), a mouse artificial chromosome, aMammalian Artificial Chromosome (MAC) (see e.g. U.S. Pat. No. 6,133,503or U.S. Pat. No. 6,077,697), an Insect Artificial Chromosome (BUGAC), anAvian Artificial Chromosome (AVAC), a Bacteriophage ArtificialChromosome, a Baculovirus Artificial Chromosome, a plant artificialchromosome (U.S. Pat. No. 5,270,201), a BIBAC vector (U.S. Pat. No.5,977,439) or a Human Artificial Chromosome (HAC).

[0238] The artificial chromosome is preferably so large that the hostcell perceives it as a “real” chromosome and maintains it and transmitsit as a chromosome. For yeast and other suitable host species, this willoften correspond approximately to the size of the smallest nativechromosome in the species. For Saccharomyces, the smallest chromosomehas a size of 225 Kb.

[0239] MACs may be used to construct artificial chromosomes from otherspecies, such as insect and fish species. The artificial chromosomespreferably are fully functional stable chromosomes. Two types ofartificial chromosomes may be used. One type, referred to as SATACs[satellite artificial chromosomes] are stable heterochromaticchromosomes, and the other type are minichromosomes based onamplificaton of euchromatin.

[0240] Mammalian artificial chromosomes provide extra-genomic specificintegration sites for introduction of genes encoding proteins ofinterest and permit megabase size DNA integration, such as integrationof concatemers according to the invention.

[0241] According to another embodiment of the invention, the concatemermay be integrated into the host chromosomes or cloned into other typesof vectors, such as a plasmid vector, a phage vector, a viral vector ora cosmid vector.

[0242] A preferable artificial chromosome vector is one that is capableof being conditionally amplified in the host cell, e.g. in yeast. Theamplification preferably is at least a 10 fold amplification.Furthermore, it is advantageous that the cloning site of the artificialchromosome vector can be modified to comprise the same restriction siteas the one bordering the cassettes described above, i.e. RS2 and/orRS2′.

[0243] Concatenation

[0244] Cassettes to be concatenated are normally excised from a vectoreither by digestion with restriction enzymes or by PCR. After excisionthe cassettes may be separated from the vector through sizefractionation such as gel filtration or through tagging of knownsequences in the cassettes. The isolated cassettes may then be joinedtogether either through interaction between sticky ends or throughligation of blunt ends.

[0245] Single-stranded compatible ends may be created by digestion withrestriction enzymes. For concatenation a preferred enzyme for excisingthe cassettes would be a rare cutter, i.e. an enzyme that recognises asequence of 7 or more nucleotides. Examples of enzymes that cut veryrarely are the meganucleases, many of which are intron encoded, likee.g. I-Ceu I, I-Sce I, I-Ppo I, and PI-Psp I (see eample 6d for more).Other preferred enzymes recognize a sequence of 8 nucleotides like e.g.Asc I, AsiS I, CciN I, CspB I, Fse I, MchA I, Not I, Pac I, Sbf I, SdaI, Sgf I, SgrA I, Sse232 I, and Sse8387 I, all of which create singlestranded, palindromic compatible ends.

[0246] Other preferred rare cutters, which may also be used to controlorientation of individual cassettes in the concatemer are enzymes thatrecognize non-palindromic sequences like e.g. Aar I, Sap I, Sfi I, SdiI, and Vpa (see example 6c for more).

[0247] Alternatively, cassettes can be prepared by the addition ofrestriction sites to the ends, e.g. by PCR or ligation to linkers (shortsynthetic dsDNA molecules). Restriction enzymes are continuously beingisolated and characterised and it is anticipated that many of such novelenzymes can be used to generate single-stranded compatible endsaccording to the present invention.

[0248] It is conceivable that single stranded compatible ends can bemade by cleaving the vector with synthetic cutters. Thus, a reactivechemical group that will normally be able to cleave DNA unspecificallycan cut at specific positions when coupled to another molecule thatrecognises and binds to specific sequences. Examples of molecules thatrecognise specific dsDNA sequences are DNA, PNA, LNA, phosphothioates,peptides, and amides. See e.g. Armitage, B.(1998) Chem. Rev. 98:1171-1200, who describes photocleavage using e.g. anthraquinone and UVlight; Dervan P. B. & Bürli R. W. (1999) Curr. Opin. Chem. Biol. 3:688-93 describes the specific binding of polyamides to DNA; Nielsen, P.E. (2001) Curr. Opin. Biotechnol. 12: 16-20 describes the specificbinding of PNA to DNA, and Chemical Reviews special thematic issue:RNA/DNA Cleavage (1998) vol. 98 (3) Bashkin J. K. (ed.) ACSpublications, describes several examples of chemical DNA cleavers.

[0249] Single-stranded compatible ends may also be created by using e.g.PCR primers including dUTP and then treating the PCR product withUracil-DNA glycosylase (Ref: U.S. Pat. No. 5,035,996) to degrade part ofthe primer. Alternatively, compatible ends can be created by tailingboth the vector and insert with complimentary nucleotides using TerminalTransferase (Chang, L M S, Bollum T J (1971) J Biol Chem 246:909).

[0250] It is also conceivable that recombination can be used to generateconcatemers, e.g. through the modification of techniques like theCreator™ system (Clontech) which uses the Cre-loxP mechanism (Sauer B1993 Methods Enzymol 225:890-900) to directionally join DNA molecules byrecombination or like the Gateway™ system (Life Technologies, U.S. Pat.No. 5,888,732) using lambda aft attachment sites for directionalrecombination (Landy A 1989, Ann Rev Biochem 58:913). It is envisagedthat also lambda cos site dependent systems can be developed to allowconcatenation.

[0251] More preferably the cassettes may be concatenated without anintervening purification step through excision from a vector with tworestriction enzymes, one leaving sticky ends on the cassettes and theother one leaving blunt ends in the vectors. This is the preferredmethod for concatenation of cassettes from vectors having the basicstructure of [RS1-RS2-SP—PR—X-TR—SP—RS2′-RS1′].

[0252] An alternative way of producing concatemers free of vectorsequences would be to PCR amplify the cassettes from a single-strandedprimary vector. The PCR product must include the restriction sites RS2and RS2′ which are subsequently cleaved by its cognate enzyme(s).Concatenation can then be performed using the digested PCR product,essentially without interference from the single stranded primary vectortemplate or the small double stranded fragments, which have been cutfrom the ends.

[0253] The concatemer may be assembled or concatenated by concatenationof at least two cassettes of nucleotide sequences each cassettecomprising a first sticky end, a spacer sequence, a promoter, anexpressible nucleotide sequence, a terminator, a spacer sequence, and asecond sticky end. A flow chart of the procedure is shown in FIG. 2a.

[0254] Preferably concatenation further comprises

[0255] starting from a primary vector [RS1-RS2-SP—PR—X-TR—SP—RS2′-RS1′],

[0256] wherein X denotes an expressible nucleotide sequence,

[0257] RS1 and RS1′ denote restriction sites,

[0258] RS2 and RS2′ denote restriction sites different from RS1 andRS1′,

[0259] SP individually denotes a spacer sequence of at least twonucleotides,

[0260] PR denotes a promoter,

[0261] TR denotes a terminator,

[0262] i) cutting the primary vector with the aid of at least onerestriction enzyme specific for RS2 and RS2′ obtaining cassettes havingthe general formula [rs₂-SP—PR—X-TR—SP-rs₁] wherein rs₁ and rs₂ togetherdenote a functional restriction site RS2 or RS2′,

[0263] ii) assembling the cut out cassettes through interaction betweenrs₁ and rs₂.

[0264] In this way at least 10 cassettes can be concatenated, such as atleast 15, for example at least 20, such as at least 25, for example atleast 30, such as from 30 to 60 or more than 60, such as at least 75,for example at least 100, such as at least 200, for example at least500, such as at least 750, for example at least 1000, such as at least1500, for example at least 2000.

[0265] According to an especially preferred embodiment, vector arms eachhaving a RS2 or RS2′ in one end and a non-complementary overhang or ablunt end in the other end are added to the concatenation mixturetogether with the cassettes described above to further simplify theprocedure (see FIG. 2b). One example of a suitable vector for providingvector arms is disclosed in FIG. 7 TRP1, URA3, and HIS3 are auxotrophicmarker genes, and AmpR is an E. coli antibiotic marker gene. CEN4 is acentromer and TEL are telomeres. ARS1 and PMB1 allow replication inyeast and E. coli respectively. BamH I and Asc I are restriction enzymerecognition sites. The nucleotide sequence of the vector is set forth inSEQ ID NO 4. The vector is digested with BamHI and Ascl to liberate thevector arms, which are used for ligation to the concatemer.

[0266] The ratio of vector arms to cassettes determines the maximumnumber of cassettes in the concatemer as illustrated in FIG. 8. Thevector arms preferably are artificial chromosome vector arms such asthose described in FIG. 7.

[0267] It is of course also possible to add stopper fragments to theconcatenation solution, the stopper fragments each having a RS2 or RS2′in one end and a non-complementary overhang or a blunt end in the otherend. The ratio of stopper fragments to cassettes can likewise controlthe maximum size of the concaterner.

[0268] The complete sequence of steps to be taken when starting with theisolation of mRNA until inserting into an entry vector may include thefollowing steps

[0269] i) isolating mRNA from an expression state;

[0270] ii) obtaining substantially full length cDNA corresponding to themRNA sequences,

[0271] iii) inserting the substantially full length cDNA into a cloningsite in a cassette in a primary vector, said cassette being of thegeneral formula in 5′→3′ direction:

[RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′]

[0272] wherein CS denotes a cloning site.

[0273] In preparation of the concatemer, genes may be isolated fromdifferent entry libraries to provide the desired selection of genes.Accordingly, concatenation may further comprise selection of vectorshaving expressible nucleotide sequences from at least two differentexpression states, such as from two different species. The two differentspecies may be from two different classes, such as from two differentdivisions, more preferably from two different sub-kingdoms, such as fromtwo different kingdoms.

[0274] As an alternative to including vector arms in the concatenationreaction it is possible to ligate the concatemer into an artificialchromosome selected from the group comprising yeast artificialchromosome, mega yeast artificial chromosome, bacterial artificialchromosome, mouse artificial chromosome, human artificial chromosome.

[0275] Preferably at least one inserted concatemer further comprises aselectable marker. The marker(s) are conveniently not included in theconcatemer as such but rather in an artificial chromosome vector, intowhich the concatemer is inserted. Selectable markers generally provide ameans to select, for growth, only those cells which contain a vector.Such markers are of two types: drug resistance and auxotrophy. A drugresistance marker enables cells to grow in the presence of an otherwisetoxic compound. Auxotrophic markers allow cells to grow in media lackingan essential component by enabling cells to synthesise the essentialcomponent (usually an amino acid).

[0276] Illustrative and non-limiting examples of common compounds forwhich selectable markers are available with a brief description of theirmode of action follow:

[0277] Prokaryotic

[0278] Ampicillin: interferes with a terminal reaction in bacterial cellwall synthesis. The resistance gene (bla) encodes beta-lactamase whichcleaves the beta-lactam ring of the antibiotic thus detoxifying it.

[0279] Tetracycline: prevents bacterial protein synthesis by binding tothe 30S ribosomal subunit. The resistance gene (tet) specifies a proteinthat modifies the bacterial membrane and prevents accumulation of theantibiotic in the cell.

[0280] Kanamycin: binds to the 70S ribosomes and causes misreading ofmessenger RNA. The resistant gene (npth) modifies the antibiotic andprevents interaction with the ribosome.

[0281] Streptomycin: binds to the 30S ribosomal subunit, causingmisreading of messenger RNA. The resistance gene (Sm) modifies theantibiotic and prevents interaction with the ribosome.

[0282] Zeocin: this new bleomycin-family antibiotic intercalates intothe DNA and cleaves it. The Zeocin resistance gene encodes a 13,665dalton protein. This protein confers resistance to Zeocin by binding tothe antibiotic and preventing it from binding DNA. Zeocin is effectiveon most aerobic cells and can be used for selection in mammalian celllines, yeast, and bacteria.

[0283] Eukaryotic

[0284] Hygromycin: a aminocyclitol that inhibits protein synthesis bydisrupting ribosome translocation and promoting mistranslation. Theresistance gene (hph) detoxifies hygromycin-B-phosphorylation.

[0285] Histidinol: cytotoxic to mammalian cells by inhibitinghistidyl-tRNA synthesis in histidine free media. The resistance gene(hisD) product inactivates histidinol toxicity by converting it to theessential amino acid, histidine.

[0286] Neomycin (G418): blocks protein synthesis by interfering withribosomal functions. The resistance gene ADH encodes amino glycosidephosphotransferase which detoxifies G418.

[0287] Uracil: Laboratory yeast strains carrying a mutated gene whichencodes orotidine-5′-phosphate decarboxylase, an enzyme essential foruracil biosynthesis, are unable to grow in the absence of exogenousuracil. A copy of the wild-type gene (ura4+, S. pombe or URA3 S.cerevisiae) carried on the vector will complement this defect intransformed cells.

[0288] Adenosine: Laboratory strains carrying a deficiency in adenosinesynthesis may be complemented by a vector carrying the wild type gene,ADE 2.

[0289] Amino acids: Vectors carrying the wild-type genes for LEU2, TRP1, HIS 3 or LYS 2 may be used to complement strains of yeast deficientin these genes.

[0290] Zeocin: this new bleomycin-family antibiotic intercalates intothe DNA and cleaves it. The Zeocin resistance gene encodes a 13,665dalton protein. This protein confers resistance to Zeocin by binding tothe antibiotic and preventing it from binding DNA. Zeocin is effectiveon most aerobic cells and can be used for selection in mammalian celllines, yeast, and bacteria.

[0291] Transgenic Cells

[0292] In one aspect of the invention, the concatemers comprising themultitude of cassettes are introduced into a host cell, in which theconcatemers can be maintained and the expressible nucleotide sequencescan be expressed in a co-ordinated way. The cassettes comprised in theconcatemers may be isolated from the host cell and re-assembled due totheir uniform structure with—preferably—concatemer restriction sitesbetween the cassettes.

[0293] The host cells selected for this purpose are preferablycultivable under standard laboratory conditions using standard cultureconditions, such as standard media and protocols. Preferably the hostcells comprise a substantially stable cell line, in which theconcatemers can be maintained for generations of cell division. Standardtechniques for transformation of the host cells and in particularmethods for insertion of artificial chromosomes into the host cells areknown.

[0294] It is also of advantage if the host cells are capable ofundergoing meiosis to perform sexual recombination. It is alsoadvantageous that meiosis is controllable through external manipulationsof the cell culture. One especially advantageous host cell type is onewhere the cells can be manipulated through external manipulations intodifferent mating types.

[0295] The genome of a number of species have already been sequencedmore or less completely and the sequences can be found in databases. Thelist of species for which the whole genome has been sequenced increasesconstantly. Preferably the host cell is selected from the group ofspecies, for which the whole genome or essentially the whole genome hasbeen sequenced. The host cell should preferably be selected from aspecies that is well described in the literature with respect togenetics, metabolism, physiology such as model organism used forgenomics research.

[0296] The host organism should preferably be conditionally deficient inthe abilities to undergo homologous recombination. The host organismshould preferably have a codon usage similar to that of the donororganisms. Furthermore, in the case of genomic DNA, if eukaryotic donororganisms are used, it is preferable that the host organism has theability to process the donor messenger RNA properly, e.g., splice outintrons.

[0297] The host cells can be bacterial, archaebacteria, or eukaryoticand can constitute a homogeneous cell line or mixed culture. Suitablecells include the bacterial and eukaryotic cell lines commonly used ingenetic engineering and protein expression.

[0298] Preferred prokaryotic host organisms may include but are notlimited to Escherichia coli, Bacillus subtilis, B licehniformis, B.cereus, Streptomyces lividans, Streptomyces coelicolor, Pseudomonasaeruginosa, Myxococcus xanthus. Rhodococcus, Streptomycetes,Actinomycetes, Corynebacteria, Bacillus, Pseudomonas, Salmonella, andErwinia. The complete genome sequences of E. coli and Bacillus subtilisare described by Blattner et al., Science 277, 1454-1462 (1997); Kunstet al., Nature 390, 249-256 (1997)).

[0299] Preferred eukaryotic host organisms are mammals, fish, insects,plants, algae and fungi.

[0300] Examples of mammalian cells include those from, e.g., monkey,mouse, rat, hamster, primate, and human, both cell lines and primarycultures. Preferred mammalian host cells include but are not limited tothose derived from humans, monkeys and rodents, such as chinese hamsterovary (CHO) cells, NIH/3T3, COS, 293, VERO, HeLa etc (see Kriegler M. in“Gene Transfer and Expression: A Laboratory Manual”, New York,. Freeman& Co. 1990), and stem cells, including embryonic stem cells andhemopoietic stem cells, zygotes, fibroblasts, lymphocytes, kidney,liver, muscle, and skin cells.

[0301] Examples of insect cells include baculo lepidoptera.

[0302] Examples of plant cells include maize, rice, wheat, cotton,soybean, and sugarcane. Plant cells such as those derived from Nicotianaand Arabidopsis are preferred

[0303] Examples of fungi include penicillium, aspergillus, such asAspergillus nidulans, podospora, neurospora, such as Neurospora crassa,saccharomyces, such as Saccharomyces cerevisiae (budding yeast),Schizosaccharomyces, such as Schizosaccharomyces pombe (fission yeast),Pichia spp, such as Pichia pastoris, and Hansenula polymorpha(methylotropic yeasts).

[0304] In a preferred embodiment the host cell is a yeast cell, and anillustrative and not limiting list of suitable yeast host cellscomprise: baker's yeast, Kluyveromyces marxianus, K. lactis, Candidautilis, Phaffia rhodozyma, Saccharomyces boulardii, Pichia pastoris,Hansenula polymorpha, Yarrowia lipolytica, Candida paraffinica,Schwanniomyces castellii, Pichia stipitis, Candida shehatae, Rhodotorulaglutinis, Lipomyces lipofer, Cryptococcos curvatus, Candida spp. (e.g.C. palmioleophila), Yarrowia lipolytica, Candida guilliermondii,Candida, Rhodotorula spp., Saccharomycopsis spp., Aureobasidiumpullulans, Candida brumptii, Candida hydrocarbofumarica, Torulopsis,Candida tropicalis, Saccharomyces cerevisiae, Rhodotorula rubra, Candidaflaveri, Eremothecium ashbyii, Pichia spp., Pichia pastoris,Kluyveromyces, Hansenula, Kloeckera, Pichia, Pachysolen spp., orTorulopsis bornbicola.

[0305] The choice of host will depend on a number of factors, dependingon the intended use of the engineered host, including pathogenicity,substrate range, environmental hardiness, presence of key intermediates,ease of genetic manipulation, and likelihood of promiscuous transfer ofgenetic information to other organisms. Particularly advantageous hostsare E. coli, lactobacilli, Streptomycetes, Actinomycetes, Saccharomycesand filamentous fungi.

[0306] In any one host cell it is possible to make all sorts ofcombinations of expressible nucleotide sequences from all possiblesources. Furthermore, it is possible to make combinations of promotersand/or spacers and/or introns and/or terminators in combination with oneand the same expressible nucleotide sequence.

[0307] Thus in any one cell there may be expressible nucleotidesequences from two different expression states. Furthermore, these twodifferent expression states may be from one species or advantageouslyfrom two different species. Any one host cell may also compriseexpressible nucleotide sequences from at least three species, such asfrom at least four, five, six, seven, eight, nine or ten species, orfrom more than 15 species such as from more than 20 species, for examplefrom more than 30, 40 or 50 species, such as from more than 100different species, for example from more than 300 different species,such as form more than 500 different species, for example from more than1000 different species, thereby obtaining combinations of large numbersof expressible nucleotide sequences from a large number of species. Inthis way potentially unlimited numbers of combinations of expressiblenucleotide sequences can be combined across different expression states.These different expression states may represent at least two differenttissues, such as at least two organs, such as at least two species, suchas at least two genera. The different species may be from at least twodifferent phylae, such as from at least two different classes, such asfrom at least two different divisions, more preferably from at least twodifferent sub-kingdoms, such as from at least two different kingdoms.

[0308] Any two of these species may be from two different classes, suchas from two different divisions, more preferably from two differentsub-kingdoms, such as from two different kingdoms. Thus expressiblenucleotide sequences may be combined from a eukaryot and a prokaryotinto one and the same cell.

[0309] According to another embodiment of the invention, the expressiblenucleotide sequences may be from one and the same expression state. Theproducts of these sequences may interact with the products of the genesin the host cell and form new enzyme combinations leading to novelbiochemical pathways. Furthermore, by putting the expressible nucleotidesequences under the control of a number of promoters it becomes possibleto switch on and off groups of genes in a co-ordinated manner. By doingthis with expressible nucleotide sequences from only one expressionstates, novel combinations of genes are also expressed.

[0310] The number of concatemers in one single cell may be at least oneconcatemer per cell, preferably at least 2 concatemers per cell, morepreferably 3 per cell, such as 4 per cell, more preferably 5 per cell,such as at least 5 per cell, for example at least 6 per cell, such as 7,8, 9 or 10 per cell, for example more than 10 per cell. As describedabove, each concatemer may preferably comprise up to 1000 cassettes, andit is envisages that one concatemer may comprise up to 2000 cassettes.By inserting up to 10 concatemers into one single cell, this cell maythus be enriched with up to 20,000 heterologous expressible genes, whichunder suitable conditions may be turned on and off by regulation of theregulatable promoters.

[0311] Often it is more preferable to provide cells having anywherebetween 10 and 1000 heterologous genes, such as 20-900 heterologousgenes, for example 30 to 800 heterologous genes, such as 40 to 700heterologous genes, for example 50 to 600 heterologous genes, such asfrom 60 to 300 heterologous genes or from 100 to 400 heterologous geneswhich are inserted as 2 to 4 artificial chromosomes each containing oneconcatemer of genes. The genes may advantageously be located on 1 to 10such as from 2 to 5 different concatemers in the cells. Each concatemermay advantageously comprise from 10 to 1000 genes, such as from 10 to750 genes, such as from 10 to 500 genes, such as from 10 to 200 genes,such as from 20 to 100 genes, for example from 30 to 60 genes, or from50 to 100 genes.

[0312] The concatemers may be inserted into the host cells according toany known transformation technique, preferably according to suchtransformation techniques that ensure stable and not transienttransformation of the host cell. The concatemers may thus be inserted asan artificial chromosome which is replicated by the cells as they divideor they may be inserted into the chromosomes of the host cell. Theconcatemer may also be inserted in the form of a plasmid such as aplasmid vector, a phage vector, a viral vector, a cosmid vector, that isreplicated by the cells as they divide. Any combination of the threeinsertion methods is also possible. One or more concatemers may thus beintegrated into the chromosome(s) of the host cell and one or moreconcatemers may be inserted as plasmids or artificial chromosomes. Oneor more concatemers may be inserted as artificial chromosomes and one ormore may be inserted into the same cell via a plasmid.

EXAMPLES Example 1

[0313] In the examples 1-3 an Asc1 site was introduced into the EcoR1site in pYAC4 (Sigma, Burke D T et al. 1987, Science vol 236, p 806), sothat sticky ends match the Asc1 site(=RS2 in general formula of thispatent) of the cassettes in pEVE vectors

[0314] Preparation of EVACs (EVolvable Artificial Chromosomes) IncludingSize Fractioning

[0315] Preparation of pYAC4-Asc Arms

[0316] 1. inoculate 150 ml of LB (sigma) with a single colony of E. coliDH5α containing pYAC4-Asc

[0317] 2. grow to OD600˜1, harvest cells and make plasmid preparation

[0318] 3. digest 100 μg pYAC4-Asc w. BamH1 and Asc1

[0319] 4. dephosphorylate fragments and heat inactivate phosphatase(20min, 80 C)

[0320] 5. purify fragments(e.g. Qiaquick Gel Extraction Kit)

[0321] 6. run 1% agarose gel to estimate amount of fragment

[0322] Preparation of Expression Cassettes

[0323] 1. take 100 μg of plasmid preparation from each of the followinglibraries

[0324] a) pMA-CAR

[0325] b) pCA-CAR

[0326] c) Phaffia cDNA library

[0327] d) Carrot cDNA library

[0328] 2. digest w. Srf1 (10 units/prep, 37 C overnight)

[0329] 3. dephosphorylate (10. units/prep, 37 C, 2 h)

[0330] 4. heat inactivate 80 C, 20 min

[0331] 5. concentrate and change buffer (precipitation or ultrafiltration),

[0332] 6. digest w. Asc1. (10 units/prep, 37 C, overnight)

[0333] 7. adjust volume of preps to 100 μL

[0334] Preparation of EVACs

[0335] Different types of EVACs have been made by varying the ratio ofthe different libraries that goes into the ligation reaction. EVACpMA-CAR pCA-CAR Phaffia cDNA Carrot cDNA A 40% 40% 10% 10% B 25% 25% 25%25%

[0336] 1. add ˜100 ng arms of pYAC4-Asc/100 μg of cassette mixture

[0337] 2. concentrate to <33.5 μL

[0338] 3. add 2.5 units of T4 DNA-ligase+4 μL 10× ligase buffer. Adjustto 40 μL

[0339] 4. ligate 3 h, 16 C

[0340] 5. stop reaction by adding 2 μL of 500 mM EDTA

[0341] 6. bring reaction volume to 125 μL, add 25 μL loading mix, heatat 60 C for 5 min

[0342] 7. distribute evenly in 10 wells of a 1% LMP agarose gel

[0343] 8. run pulsed field gel (CHEF III, 1% LMP agarose, ½ strength TBE(BioRad), angle 120, temperature 12 C, voltage 5.6 V/cm, switch timeramping 5-25 s, run time 30 h)

[0344] 9. stain part of the gel that contains molecular weight markers+1sample lane for quality check

[0345] 10. cut remaining 9 sample lanes corresponding to mw. 97-194kb(fraction 1); 194-291 kb(fraction 2); 291-365 kb(fraction 3) from thegel

[0346] 11. agarase gel in high NaCl agarase buffer. 1 u agarase/100 μggel. 40 C 3 h

[0347] 12. concentrate preparation to <20 μL

[0348] 13. transform suitable yeast strain w. preparation usingalkali/cation transformation

[0349] 14. plate on selective minimal media plates

[0350] 15. incubate 30 C for 4-5 days

[0351] 16. pick colonies

[0352] 17. analyse colonies

Example 2

[0353] Preparation of EVACs (EVolvable Artificial Chromosomes) withDirect Transformation

[0354] Preparation of pYAC4-Asc Arms

[0355] 1. inoculate 150 ml of LB with a single colony of DH5α containingpYAC4-Asc

[0356] 2. grow to OD600˜1, harvest cells and make plasmid preparation

[0357] 3. digest 100 μg pYAC4-Asc w. BamH1 and Asc1

[0358] 4. dephosphorylate fragments and heat inactivate phosphatase(20min, 80 C)

[0359] 5. purify fragments(e.g. Qiaquick Gel Extraction Kit)

[0360] 1. run 1% agarose gel to estimate amount of fragment

[0361] Preparation of Expression Cassettes

[0362] 1. take 100 μg of plasmid preparation from each of the followinglibraries

[0363] e) pMA-CAR

[0364] f) pCA-CAR

[0365] g) Phaffia cDNA library

[0366] h) Carrot cDNA library

[0367] 2. digest w. Srf1(10 units/prep, 37 C overnight)

[0368] 3. dephosphorylate (10 units/prep, 37 C, 2 h)

[0369] 4. heat inactivate 80 C, 20 min

[0370] 5. concentrate and change buffer (precipitation or ultrafiltration),

[0371] 6. digest w. Asc1. (10 units/prep, 37 C, overnight)

[0372] 7. adjust volume of preps to 100 μL

[0373] Preparation of EVACs

[0374] Different types of EVACs have been made by varying the ratio ofthe different libraries that goes into the ligation reaction. EVACpMA-CAR pCA-CAR Phaffia cDNA Carrot cDNA A 40% 40% 10% 10% B 25% 25% 25%25%

[0375] 1. concentrate to <32 μL

[0376] 2. add 1 unit of T4 DNA-ligase+4 μL 10× ligase buffer. Adjust to40 μL

[0377] 3. ligate 2 h, 16 C

[0378] 4. stop reaction by adding 2 μL of 500 mM EDTA, heat inactivate60 C, 20 min

[0379] 5. bring reaction volume to 500 μL with dH₂O, concentrate to 30μL

[0380] 6. add 10 U Asc1, 4 μL 10×Asc1 buffer, bring to 40 μL

[0381] 7. incubate at 37 C for 1 h (alternatively 15 min 30 min)

[0382] 8. heat inactivate 60 C, 20 min

[0383] 9. add 2 μg YAC4-Asc arms, 1 U T4 DNA ligase, 10 μL 10× ligasebuffer, bring to 100 μL

[0384] 10. incubate ON, 16 C

[0385] 11. add water to 500 μL

[0386] 12. concentrate to 25 μL

[0387] 13. transform suitable yeast strain w. preparation usingalkali/cation transformation or other suitable transformation method

[0388] 14. plate on selective minimal media plates

[0389] 15. incubate 30 C for 4-5 days

[0390] 16. pick colonies

[0391] 17. analyse colonies

Example 3

[0392] Preparation of EVACs (EVolvable Artificial Chromosomes) (SmallScale Preparation)

[0393] Preparation of Expression Cassettes

[0394] 1. inoculate 5 ml of LB-medium (Sigma) with library inoculumcorresponding to a 10+ fold representation of library. Grow overnight

[0395] 2. make plasmid miniprep from 1.5 ml of culture (E.g. Qiaprepspin miniprep kit)

[0396] 3. digest plasmid W. Srf 1

[0397] 4. dephosphorylate fragments and heat inactivate phosphatase(20min, 80 C)

[0398] 5. digest w. Asc1

[0399] 6. run {fraction (1/10)} of reaction in 1% agarose to estimateamount of fragment

[0400] Preparation of pYAC4-Asc Arms

[0401] 1. inoculate 150 ml of LB with a single colony of E. coli DH5αcontaining pYAC4-Asc

[0402] 2. grow to OD600˜1, harvest cells and make plasmid preparation

[0403] 3. digest 100 μg pYAC4-Asc w. BamH1 and Asc1

[0404] 4. dephosphorylate fragments and heat inactivate phosphatase(20min, 80 C)

[0405] 5. purify fragments(E.g. Qiaquick Gel Extraction Kit)

[0406] 6. run 1%. agarose gel to estimate amount of fragment

[0407] Preparation of EVACs

[0408] 1. mix expression cassette fragments with YAC-arms so thatcassette/arm ration is ˜1000/1

[0409] 2. if needed concentrate mixture(use e.g. Microcon YM30) sofragment concentration>75 ng/μL reaction

[0410] 3. add 1 U T4 DNA ligase, incubate 16 C, 1-3 h. Stop reaction byadding 1 μL of 500 mM EDTA

[0411] 4. run pulsed field gel (CHEF III, 1% LMP agarose, ½ strengthTBE, angle 120, temperature 12 C, voltage 5.6 V/cm, switch time ramping5-25 s, run time 30 h) Load sample in 2 lanes.

[0412] 5. stain part of the gel that contains molecular weight markers

[0413] 6. cut sample lanes corresponding to mw. 100-200 kb

[0414] 7. agarase gel in high NaCl agarase buffer. 1 U agarase/100 mggel

[0415] 8. concentrate preparation to <20 μL

[0416] 9. transform suitable yeast strain w. preparation usingelectroporation

[0417] 10. plate on selective minimal media plates

[0418] 11. incubate 30 C for 4-5 days

[0419] 12. pick colonies

Example 4 cDNA Libraries Used in the Production of EVACs

[0420] 1. Daucus carota, carrot root library:

[0421] Full length

[0422] Oligo dT primed, directional cDNA library

[0423] cDNA library made using a pool of 3 Evolva EVE 4, 5 & 8 vectors(FIGS. 4, 5, 6)

[0424] Number of independent clones: 41.6×10⁶

[0425] Average size: 0.9-2.9 kb

[0426] Number of different genes present: 5000-10000

[0427] 2. Xanthophyllomyces dendrorhous, (yeast), hole organism library

[0428] Full length

[0429] Oligo dT primed, directional cDNA library

[0430] cDNA library made using a pool of 3 Evolva EVE 4, 5 & 8 vectors(FIGS. 4, 5, 6)

[0431] Number of independent clones: 48.0×10⁶

[0432] Average size: 1.0-3.8 kb

[0433] Number of different genes present: 5000-10000

[0434] 3. Target carotenoid gene cDNA library

[0435] Full length and normalised

[0436] Directional cDNA cloning

[0437] Library made by cloning each gene individually in 2 Evolva EVE 4,5 & 8 vectors (FIGS. 4, 5, 6)

[0438] Number of different genes: 48

[0439] Species and genes used:

[0440] Gentiana sp., ggps, psy, pds, zds, lcy-b, lcy-e, bhy, zep

[0441]Rhodobacter capsulatus, idi, crtC, crtF

[0442]Erwinia uredovora, crtE, crtB, crtl, crtY, crtZ

[0443]Nostoc anabaena, zds

[0444] Synechococcus PCC7942, pds

[0445]Erwinia herbicola, crtE, crtB, crtl, crtY, crtZ

[0446]Staphylococcus aureus, crtM, crtN

[0447]Xanthophyllomyces dendrorhous, crtl, crtYb

[0448]Capsicum annuum, ccs, crtL

[0449]Nicotiana tabacum, crtL, bchy

[0450] Prochlorococcus sp., lcy-b, lcy-e

[0451]Saccharomyces cerevisiae, idi

[0452] Corynebacterium sp., crtl, crtYe, crtYf, crtEb

[0453]Lycopersicon esculentum, psy-1

[0454]Neurospora crassa, al1

Example 5 Transformation of EVACs Example 5a Transformation

[0455] 1. Inoculate a single colony into 100 ml YPD broth and grow withaeration at 30° C. to mid log, 2×10⁶ to 2×10⁷ cells/ml.

[0456] 2. Spin to pellet cells at 400×g for 5 minutes; discardsupernatant.

[0457] 3. Resuspend cells in a total of 9 ml TE,. pH 7.5. Spin to pelletcells and discard supernatant.

[0458] 4. Gently resuspend cells in 5 ml 0.1 M Lithium/Cesium Acetatesolution, pH 7.5.

[0459] 5. Incubate at 30° C. for 1 hour with gentle shaking.

[0460] 6. Spin at 400×g for 5 minutes to pellet cells and discardsupernatant.

[0461] 7. Gently resuspend in 1 ml TE, pH 7.5. Cells are now ready fortransformation.

[0462] 8. In a 1.5 ml tube combine:

[0463] 100 μl yeast cells

[0464] 5 μl Carrier DNA (10 mg/ml)

[0465] 5 μl Histamine Solution

[0466] ⅕ of an EVAC preparation in a 10 μl volume (max). (One EVACpreparation is made of 100 μg of concatenation reaction mixture)

[0467] 9. Gently mix and incubate at room temperature for 30 minutes.

[0468] 10. In a separate tube, combine 0.8 ml 50% (w/v) PEG 4000 and 0.1ml TE and 0.1 ml of 1 M LiAc for each transformation reaction. Add 1 mlof this PEG/TE/LiAc mix to each transformation reaction. Mix cells intosolution with gentle pipetting.

[0469] 11. Incubate at 30° C. for 1 hour.

[0470] 12. Heat shock at 42° C. for 15 minutes; cool to 30° C.

[0471] 13. Pellet cells in a microcentrifuge at high speed for 5 secondsand remove supernatant.

[0472] 14. Resuspend in 200 μl of rich media and plate in appropriateselective media

[0473] 15. Incubate at 30° C. for 48-72 hours until transformantcolonies appear.

Example 5b Transformation of EVACs using Electroporation

[0474] 100 ml of YPD is inoculated with one yeast colony and grown toOD₆₀₀=1.3 to 1.5. The culture is harvested by centrifuging at 4000×g and4° C. The cells are resuspended in 16 ml sterile H₂O. Add 2 ml 10×TEbuffer, pH 7.5 and swirl to mix. Add 2 ml 10× lithium acetate solution(1 M, pH 7.5) and swirl to mix. Shake gently 45 min at 30° C. Add 1.0 ml0.5 M DTE while swirling. Shake gently 15 min at 30° C. The yeastsuspension is diluted to 100 ml with sterile water. The cells are washedand concentrated by centrifuging at 4000×g, resuspending the pellet in50 ml ice-cold sterile water, centrifuging at 4000×g, resuspending thepellet in 5 ml ice-cold sterile water, centrifuging at 4000×g andresuspending the pellet in 0.1 ml ice-cold sterile 1 M sorbitol. Theelectroporation was done using a Bio-Rad Gene Pulser. In a sterile1.5-ml microcentrifuge tube 40 μl concentrated yeast cells were mixedwith 5 μl 1:10 diluted EVAC preparation. The yeast-DNA mix istransferred to an ice-cold 0.2-cm-gap disposable electroporation cuvetteand pulsed at 1.5 kV, 25 μF, 200 Ω. 1 ml ice-cold 1 M sorbitol is addedto the cuvette to recover the yeast. Aliquots are spread on selectiveplates containing 1 M sorbitol. Incubate at 30° C. until coloniesappear.

Example 6 Rare Restriction Enzymes with Recognition Sequence andCleavage Points

[0475] In this example, rare restriction enzymes are listed togetherwith their recognition sequence and cleavage points. ({circumflex over( )}) indicates cleavage points 5′-3′ sequence and (_) indicatescleavage points in the complementary sequence. 6a) Unique, palindromicoverhang AscI GG{circumflex over ( )}CGCG_CC AsiSI GCG_AT{circumflexover ( )}CGC CciNI GC{circumflex over ( )}GGCC_GC CspBI GC{circumflexover ( )}GGCC_GC FseI GG_CCGG{circumflex over ( )}CC MchAI GC{circumflexover ( )}GGCC_GC NotI GC{circumflex over ( )}GGCC_GC PacITTA_AT{circumflex over ( )}TAA SbfI CC_TGCA{circumflex over ( )}GG SdaICC_TGCA{circumflex over ( )}GG SgfI GCG_AT{circumflex over ( )}CGC SgrAICR{circumflex over ( )}CCGG_YG Sse232I CG{circumflex over ( )}CCGG_CGSse8387I CC_TGCA{circumflex over ( )}GG 6b) No overhang BstRZ246IATTT{circumflex over ( )}AAAT BstSWI ATTT{circumflex over ( )}AAATMspSWI ATTT{circumflex over ( )}AAAT MssI GTTT{circumflex over ( )}AAACPmeI GTTT{circumflex over ( )}AAAC SmiI ATTT{circumflex over ( )}AAATSrfI GCCC{circumflex over ( )}GGGC SwaI ATTT{circumflex over ( )}AAAT6c) Non-palindromic and/or variable overhang AarI CACCTGCNNNN{circumflexover ( )}NNNN_ AbeI CC{circumflex over ( )}TCA_GC AloI {circumflex over( )}NNNNN_NNNNNNNGAACNNNNNNTCCNNNNNNN_ NNNNN{circumflex over ( )} BaeI{circumflex over ( )}NNNNN_NNNNNNNNNNACNNNNGTAYCNNNNNNN_NNNNN{circumflex over ( )} BbvCI CC{circumflex over ( )}TCA_GC CpoICG{circumflex over ( )}GWC_CG CspI CG{circumflex over ( )}GWC_CG Pfl27IRG{circumflex over ( )}GWC_CY PpiI {circumflex over( )}NNNNN_NNNNNNNGAACNNNNNCTCNNNNNNNN_ NNNNN{circumflex over ( )} PpuMIRG{circumflex over ( )}GWC_CY PpuXI RG{circumflex over ( )}GWC_CY Psp5IIRG{circumflex over ( )}GWC_CY PspPPI RG{circumflex over ( )}GWC_CY RsrIICG{circumflex over ( )}GWC_CG Rsr2I CG{circumflex over ( )}GWC_CG SanDIGG{circumflex over ( )}GWC_CC SapI GCTCTTCN{circumflex over ( )}NNN_SdiI GGCCN_NNN{circumflex over ( )}NGGCC SexAI A{circumflex over( )}CCWGG_T SfiI GGCCN_NNN{circumflex over ( )}NGGCC Sse1825IGG{circumflex over ( )}GWC_CC Sse8647I AG{circumflex over ( )}GWC_CTVpaK32I GCTCTTCN{circumflex over ( )}NNN_ 6d) Meganucleases I-Sce ITAGGGATAA_CAGG{circumflex over ( )}GTAAT I-Ceu I ACGGTC_CTAA{circumflexover ( )}GGTAG I-Cre I AAACGTC_GTGA{circumflex over ( )}GACAGTTT I-SceII GGTC_ACCC{circumflex over ( )}TGAAGTA I-Sce IIIGTTTTGG_TAAC{circumflex over ( )}TATTTAT Endo. Sce IGATGCTGC_AGGC{circumflex over ( )}ATAGGCTTGTTTA PI-Sce IGG_GTGC{circumflex over ( )}GGAGAA PI-Psp ITGGCAAACAGCTA_TTAT{circumflex over ( )}GGGTATTATGGGT I-Ppo ICTCTC_TTAA{circumflex over ( )}GGTAG HO TTTCCGC_AACA{circumflex over( )}GT I-Tev I NN_NN{circumflex over( )}NNTCAGTAGATGTTTTTCTTGGTCTACCGTTT

[0476] More meganucleases have been identified, but their precisesequence of recognition has not been determined, see e.g.www.meganuclease.com

Example 7 Concatemer Size Limitation Experiments (Use of Stoppers)

[0477] Materials used:

[0478] pYAC4 (Sigma. Burke et al. 1987, science, vol 236, p 806) wasdigested w. EcoR1 and BamH1 and dephosphorylated

[0479] pSE420 (invitrogen) was linearised using EcoR1 and used as themodel fragment for concatenation.

[0480] T4 DNA ligase (Amersham-pharmacia biotech) was used for ligationaccording to manufacturers instructions.

[0481] Method: Fragments and arms were mixed in theratios(concentrations are arbitrary units) indicated on FIGS. 9a and 9b. Ligation was allowed to proceed for 1 h at 16 C. Reaction was stoppedby the addition of 1 μL 500 mM EDTA. Products were analysed by standardagarose GE (1% agarose, ½ strength TBE) or by PFGE(CHEF III, 1% LMPagarose, 2 strength TBE, angle 120, temperature 12 C, voltage 5.6 V/cm,switch time ramping 5-25 s, run time 30 h)

[0482] The results are shown in FIG. 9, wherein it is shown that thesize of concatemers is proportional to the ratio of cassettes per YACarms.

Example 8 Integration of Expression Cassettes into ArtificialChromosomes

[0483] Integration of expression cassettes into YAC12 was doneessentially as done by Sears D. D., Hieter P., Simchen G., Genetics,1994, 138, 1055-1065.

[0484] An Ascl site was introduced into the Bgl II site of theintegration vectors pGS534 and pGS525.

[0485] A β-galactosidase gene, as well as crtE, crtB, crtl and crtY fromErwinia Uredovora were cloned into pEVE4. These expression cassetteswere ligated into Ascl of the modified integration vectors pGS534 andpGS525.

[0486] Linearised pGS534 and pGS525 containing the expression cassetteswere transformed into haploid yeast strains containing the appropriatetarget YAC which carries the Ade” gene. Red Ade-transformants wereselected (the parent host strain is red due to the ade2-101 mutation).

[0487] Additional confirmation of correct integration of theβ-galactosidase expression cassette was done using a β-galactosidaseassay.

Example 9 Re-Transformation of Cells that Already Contain ArtificialChromosomes to Obtain at Least 2 Artificial Chromosomes Per Cell

[0488] Yeast strains containing YAC12, Sears D. D., Hieter P., SimchenG., Genetics, 1994, 138, 1055-1065 were transformed with EVACs followingthe protocol described in example 4a. The transformed cells were platedon plates that select for cells that contained both YAC12 and EVACs.

Example 10 Example of Different Expression Patterns “Phenotypes”Obtained Using the Same Yeast Clones under Different ExpressionConditions

[0489] Colonies were picked with a sterile toothpick and streakedsequentially onto plates corresponding to the four repressed and/orinduced conditions (-Ura/-Trp, -Ura/-Trp/-Met, -Ura/-Trp/+200 μM Cu₂SO₄,-Ura/-Trp/-Met/+200 μM Cu₂SO₄). 20 mg adenin was added to the media tosuppress the ochre phenotype.

1 4 1 3417 DNA Synthetic, vector Eve4 misc_feature (1902)..(2759)Ampicillin resistance gene 1 ctgatttgcc cgggcagttc aggctcatca ggcgcgccatgcagggattc ttcggatgca 60 agggttcgaa tcccttagct ctcattattt tttgctttttctcttgaggt cacatgatcg 120 caaaatggca aatggcacgt gaagctgtcg atattggggaactgtggtgg ttggcaaatg 180 actaattaag ttagtcaagg cgccatcctc atgaaaactgtgtaacataa taaccgaagt 240 gtcgaaaagg tggcaccttg tccaattgaa cacgctcgatgaaaaaaata agatatatat 300 aaggttaagt aaagcgtctg ttagaaagga agtttttcctttttcttgct ctcttgtctt 360 ttcatctact atttccttcg tgtaatacag ggtcgtcagatacatagata caattctatt 420 acccccatcc atacaagctt ggcgccgaat tcgtcgacccggggatccgc ggccgcaggc 480 ctaaattgat ctagagcttt ggacttcttc gccagaggtttggtcaagtc tccaatcaag 540 gttgtcggct tgtctacctt gccagaaatt tacgaaaagatggaaaaggg tcaaatcgtt 600 ggtagatacg ttgttgacac ttctaaataa gcgaatttcttatgatttat gatttttatt 660 attaaataag ttataaaaaa aataagtgta tacaaattttaaagtgactc ttaggtttta 720 aaacgaaaat tcttgttctt gagtaactct ttcctgtaggtcaggttgct ttctcaggta 780 tagcatgagg tcgctcttat tgaccacacc tctaccggcatgcccatggg ttaactgatc 840 aatgcatcct gcatggcgcg cctgatgagc ctgaactgcccgggcaaatc agctggacgt 900 ctgcctgcat taatgaatcg gccaacgcgc ggggagaggcggtttgcgta ttgggcgctc 960 ttccgcttcc tcgctcactg actcgctgcg ctcggtcgttcggctgcggc gagcggtatc 1020 agctcactca aaggcggtaa tacggttatc cacagaatcaggggataacg caggaaagaa 1080 catgtgagca aaaggccagc aaaaggccag gaaccgtaaaaaggccgcgt tgctggcgtt 1140 tttccatagg ctccgccccc ctgacgagca tcacaaaaatcgacgctcaa gtcagaggtg 1200 gcgaaacccg acaggactat aaagatacca ggcgtttccccctggaagct ccctcgtgcg 1260 ctctcctgtt ccgaccctgc cgcttaccgg atacctgtccgcctttctcc cttcgggaag 1320 cgtggcgctt tctcatagct cacgctgtag gtatctcagttcggtgtagg tcgttcgctc 1380 caagctgggc tgtgtgcacg aaccccccgt tcagcccgaccgctgcgcct tatccggtaa 1440 ctatcgtctt gagtccaacc cggtaagaca cgacttatcgccactggcag cagccactgg 1500 taacaggatt agcagagcga ggtatgtagg cggtgctacagagttcttga agtggtggcc 1560 taactacggc tacactagaa ggacagtatt tggtatctgcgctctgctga agccagttac 1620 cttcggaaaa agagttggta gctcttgatc cggcaaacaaaccaccgctg gtagcggtgg 1680 tttttttgtt tgcaagcagc agattacgcg cagaaaaaaaggatctcaag aagatccttt 1740 gatcttttct acggggtctg acgctcagtg gaacgaaaactcacgttaag ggattttggt 1800 catgagatta tcaaaaagga tcttcaccta gatccttttaaattaaaaat gaagttttaa 1860 atcaatctaa agtatatatg agtaaacttg gtctgacagttaccaatgct taatcagtga 1920 ggcacctatc tcagcgatct gtctatttcg ttcatccatagttgcctgac tccccgtcgt 1980 gtagataact acgatacggg agggcttacc atctggccccagtgctgcaa tgataccgcg 2040 agacccacgc tcaccggctc cagatttatc agcaataaaccagccagccg gaagggccga 2100 gcgcagaagt ggtcctgcaa ctttatccgc ctccatccagtctattaatt gttgccggga 2160 agctagagta agtagttcgc cagttaatag tttgcgcaacgttgttgcca ttgctacagg 2220 catcgtggtg tcacgctcgt cgtttggtat ggcttcattcagctccggtt cccaacgatc 2280 aaggcgagtt acatgatccc ccatgttgtg caaaaaagcggttagctcct tcggtcctcc 2340 gatcgttgtc agaagtaagt tggccgcagt gttatcactcatggttatgg cagcactgca 2400 taattctctt actgtcatgc catccgtaag atgcttttctgtgactggtg agtactcaac 2460 caagtcattc tgagaatagt gtatgcggcg accgagttgctcttgcccgg cgtcaatacg 2520 ggataatacc gcgccacata gcagaacttt aaaagtgctcatcattggaa aacgttcttc 2580 ggggcgaaaa ctctcaagga tcttaccgct gttgagatccagttcgatgt aacccactcg 2640 tgcacccaac tgatcttcag catcttttac tttcaccagcgtttctgggt gagcaaaaac 2700 aggaaggcaa aatgccgcaa aaaagggaat aagggcgacacggaaatgtt gaatactcat 2760 actcttcctt tttcaatatt attgaagcat ttatcagggttattgtctca tgagcggata 2820 catatttgaa tgtatttaga aaaataaaca aataggggttccgcgcacat ttccccgaaa 2880 agtgccacct gacgcgccct gtagcggcgc attaagcgcggcgggtgtgg tggttacgcg 2940 cagcgtgacc gctacacttg ccagcgccct agcgcccgctcctttcgctt tcttcccttc 3000 ctttctcgcc acgttcgccg gctttccccg tcaagctctaaatcgggggc tccctttagg 3060 gttccgattt agtgctttac ggcacctcga ccccaaaaaacttgattagg gtgatggttc 3120 acgtagtggg ccatcgccct gatagacggt ttttcgccctttgacgttgg agtccacgtt 3180 ctttaatagt ggactcttgt tccaaactgg aacaacactcaaccctatct cggtctattc 3240 ttttgattta taagggattt tgccgatttc ggcctattggttaaaaaatg agctgattta 3300 acaaaaattt aacgcgaatt ttaacaaaat attaacgcttacaatttcca ttcgccattc 3360 aggctgcgca actgttggga agggcgatcg gtgcgggcctcttcgctatt acgccag 3417 2 3501 DNA Synthetic, vector EVE5 misc_feature(1986)..(2843) Ampicillin resistance gene 2 ctgatttgcc cgggcagttcaggctcatca ggcgcgccat gcagggataa gccgatccca 60 ttaccgacat ttgggcgctatacgtgcata tgttcatgta tgtatctgta tttaaaacac 120 ttttgtatta tttttcctcatatatgtgta taggtttata cggatgattt aattattact 180 tcaccaccct ttatttcaggctgatatctt agccttgtta ctagttagaa aaagacattt 240 ttgctgtcag tcactgtcaagagattcttt tgctggcatt tcttctagaa gcaaaaagag 300 cgatgcgtct tttccgctgaaccgttccag caaaaaagac taccaacgca atatggattg 360 tcagaatcat ataaaagagaagcaaataac tccttgtctt gtatcaattg cattataata 420 tcttcttgtt agtgcaatatcatatagaag tcatcgaaat agatattaag aaaaacaaac 480 tgtacaatca atcaatcaatcatcacataa aatgttcaaa gcttggcgcc gaattcgtcg 540 acccggggat ccgcggccgcaggcctaaat tgatctagag ctttggactt cttcgccaga 600 ggtttggtca agtctccaatcaaggttgtc ggcttgtcta ccttgccaga aatttacgaa 660 aagatggaaa agggtcaaatcgttggtaga tacgttgttg acacttctaa ataagcgaat 720 ttcttatgat ttatgatttttattattaaa taagttataa aaaaaataag tgtatacaaa 780 ttttaaagtg actcttaggttttaaaacga aaattcttgt tcttgagtaa ctctttcctg 840 taggtcaggt tgctttctcaggtatagcat gaggtcgctc ttattgacca cacctctacc 900 ggcatgccca tgggttaactgatcaatgca tcctgcatgg cgcgcctgat gagcctgaac 960 tgcccgggca aatcagctggacgtctgcct gcattaatga atcggccaac gcgcggggag 1020 aggcggtttg cgtattgggcgctcttccgc ttcctcgctc actgactcgc tgcgctcggt 1080 cgttcggctg cggcgagcggtatcagctca ctcaaaggcg gtaatacggt tatccacaga 1140 atcaggggat aacgcaggaaagaacatgtg agcaaaaggc cagcaaaagg ccaggaaccg 1200 taaaaaggcc gcgttgctggcgtttttcca taggctccgc ccccctgacg agcatcacaa 1260 aaatcgacgc tcaagtcagaggtggcgaaa cccgacagga ctataaagat accaggcgtt 1320 tccccctgga agctccctcgtgcgctctcc tgttccgacc ctgccgctta ccggatacct 1380 gtccgccttt ctcccttcgggaagcgtggc gctttctcat agctcacgct gtaggtatct 1440 cagttcggtg taggtcgttcgctccaagct gggctgtgtg cacgaacccc ccgttcagcc 1500 cgaccgctgc gccttatccggtaactatcg tcttgagtcc aacccggtaa gacacgactt 1560 atcgccactg gcagcagccactggtaacag gattagcaga gcgaggtatg taggcggtgc 1620 tacagagttc ttgaagtggtggcctaacta cggctacact agaaggacag tatttggtat 1680 ctgcgctctg ctgaagccagttaccttcgg aaaaagagtt ggtagctctt gatccggcaa 1740 acaaaccacc gctggtagcggtggtttttt tgtttgcaag cagcagatta cgcgcagaaa 1800 aaaaggatct caagaagatcctttgatctt ttctacgggg tctgacgctc agtggaacga 1860 aaactcacgt taagggattttggtcatgag attatcaaaa aggatcttca cctagatcct 1920 tttaaattaa aaatgaagttttaaatcaat ctaaagtata tatgagtaaa cttggtctga 1980 cagttaccaa tgcttaatcagtgaggcacc tatctcagcg atctgtctat ttcgttcatc 2040 catagttgcc tgactccccgtcgtgtagat aactacgata cgggagggct taccatctgg 2100 ccccagtgct gcaatgataccgcgagaccc acgctcaccg gctccagatt tatcagcaat 2160 aaaccagcca gccggaagggccgagcgcag aagtggtcct gcaactttat ccgcctccat 2220 ccagtctatt aattgttgccgggaagctag agtaagtagt tcgccagtta atagtttgcg 2280 caacgttgtt gccattgctacaggcatcgt ggtgtcacgc tcgtcgtttg gtatggcttc 2340 attcagctcc ggttcccaacgatcaaggcg agttacatga tcccccatgt tgtgcaaaaa 2400 agcggttagc tccttcggtcctccgatcgt tgtcagaagt aagttggccg cagtgttatc 2460 actcatggtt atggcagcactgcataattc tcttactgtc atgccatccg taagatgctt 2520 ttctgtgact ggtgagtactcaaccaagtc attctgagaa tagtgtatgc ggcgaccgag 2580 ttgctcttgc ccggcgtcaatacgggataa taccgcgcca catagcagaa ctttaaaagt 2640 gctcatcatt ggaaaacgttcttcggggcg aaaactctca aggatcttac cgctgttgag 2700 atccagttcg atgtaacccactcgtgcacc caactgatct tcagcatctt ttactttcac 2760 cagcgtttct gggtgagcaaaaacaggaag gcaaaatgcc gcaaaaaagg gaataagggc 2820 gacacggaaa tgttgaatactcatactctt cctttttcaa tattattgaa gcatttatca 2880 gggttattgt ctcatgagcggatacatatt tgaatgtatt tagaaaaata aacaaatagg 2940 ggttccgcgc acatttccccgaaaagtgcc acctgacgcg ccctgtagcg gcgcattaag 3000 cgcggcgggt gtggtggttacgcgcagcgt gaccgctaca cttgccagcg ccctagcgcc 3060 cgctcctttc gctttcttcccttcctttct cgccacgttc gccggctttc cccgtcaagc 3120 tctaaatcgg gggctccctttagggttccg atttagtgct ttacggcacc tcgaccccaa 3180 aaaacttgat tagggtgatggttcacgtag tgggccatcg ccctgataga cggtttttcg 3240 ccctttgacg ttggagtccacgttctttaa tagtggactc ttgttccaaa ctggaacaac 3300 actcaaccct atctcggtctattcttttga tttataaggg attttgccga tttcggccta 3360 ttggttaaaa aatgagctgatttaacaaaa atttaacgcg aattttaaca aaatattaac 3420 gcttacaatt tccattcgccattcaggctg cgcaactgtt gggaagggcg atcggtgcgg 3480 gcctcttcgc tattacgcca g3501 3 4188 DNA Synthetic, Vector EVE8 misc_feature (2673)..(3530)Ampicillin resistance gene 3 ctgatttgcc cgggcagttc aggctcatca ggcgcgccatgcagggattc tggaaattgc 60 aacgaaggaa gaaacctcgt tgctggaagc ctggaagaagtatcgggtgt tgctgaaccg 120 tgttgataca tcaactgcac ctgatattga gtggcctgctgtccctgtta tggagtaatc 180 gttttgtgat atgccgcaga aacgttgtat gaaataacgttctgcggtta gttagtatat 240 tgtaaagctg agtattggtt tatttggcga ttattatcttcaggagaata atggaagttc 300 tatgactcaa ttgttcatag tgtttacatc accgccaattgcttttaaga ctgaacgcat 360 gaaatatggt ttttcgtcat gttttgagtc tgctgttgatatttctaaag tcggtttttt 420 ttcttcgttt tctctaacta ttttccatga aatacatttttgattattat ttgaatcaat 480 tccaattacc tgaagtcttt catctataat tggcattgtatgtattggtt tattggagta 540 gatgcttgct tttctgagcc atagctctga tatcagatcttcttcggatg caagggttcg 600 aatcccttag ctctcattat tttttgcttt ttctcttgaggtcacatgat cgcaaaatgg 660 caaatggcac gtgaagctgt cgatattggg gaactgtggtggttggcaaa tgactaatta 720 agttagtcaa ggcgccatcc tcatgaaaac tgtgtaacataataaccgaa gtgtcgaaaa 780 ggtggcacct tgtccaattg aacacgctcg atgaaaaaaataagatatat ataaggttaa 840 gtaaagcgtc tgttagaaag gaagtttttc ctttttcttgctctcttgtc ttttcatcta 900 ctatttcctt cgtgtaatac agggtcgtca gatacatagatacaattcta ttacccccat 960 ccatacaagc ttggcgccga attcgtcgac ccggggatccgcggccgcag gcctaaattg 1020 atctagagct ttggacttct tcgccagagg tttggtcaagtctccaatca aggttgtcgg 1080 cttgtctacc ttgccagaaa tttacgaaaa gatggaaaagggtcaaatcg ttggtagata 1140 cgttgttgac acttctaaat aagcgaattt cttatgatttatgattttta ttattaaata 1200 agttataaaa aaaataagtg tatacaaatt ttaaagtgactcttaggttt taaaacgaaa 1260 attcttgttc ttgagtaact ctttcctgta ggtcaggttgctttctcagg tatagcatga 1320 ggtcgctctt attgaccaca cctctaccgg catgcccatgggttcttttg aaaagcaagc 1380 ataaaagatc taaacataaa atctgtaaaa taacaagatgtaaagataat gctaaatcat 1440 ttggcttttt gattgattgt acaggaaaat atacatcgcagggggttgac ttttaccatt 1500 tcaccgcaat ggaatcaaac ttgttgaaga gaatgttcacaggcgcatac gctacaatga 1560 cccgattctt gctagccttt tctcggtctt gcaaacaaccgccaactgat caatgcatcc 1620 tgcatggcgc gcctgatgag cctgaactgc ccgggcaaatcagctggacg tctgcctgca 1680 ttaatgaatc ggccaacgcg cggggagagg cggtttgcgtattgggcgct cttccgcttc 1740 ctcgctcact gactcgctgc gctcggtcgt tcggctgcggcgagcggtat cagctcactc 1800 aaaggcggta atacggttat ccacagaatc aggggataacgcaggaaaga acatgtgagc 1860 aaaaggccag caaaaggcca ggaaccgtaa aaaggccgcgttgctggcgt ttttccatag 1920 gctccgcccc cctgacgagc atcacaaaaa tcgacgctcaagtcagaggt ggcgaaaccc 1980 gacaggacta taaagatacc aggcgtttcc ccctggaagctccctcgtgc gctctcctgt 2040 tccgaccctg ccgcttaccg gatacctgtc cgcctttctcccttcgggaa gcgtggcgct 2100 ttctcatagc tcacgctgta ggtatctcag ttcggtgtaggtcgttcgct ccaagctggg 2160 ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgccttatccggta actatcgtct 2220 tgagtccaac ccggtaagac acgacttatc gccactggcagcagccactg gtaacaggat 2280 tagcagagcg aggtatgtag gcggtgctac agagttcttgaagtggtggc ctaactacgg 2340 ctacactaga aggacagtat ttggtatctg cgctctgctgaagccagtta ccttcggaaa 2400 aagagttggt agctcttgat ccggcaaaca aaccaccgctggtagcggtg gtttttttgt 2460 ttgcaagcag cagattacgc gcagaaaaaa aggatctcaagaagatcctt tgatcttttc 2520 tacggggtct gacgctcagt ggaacgaaaa ctcacgttaagggattttgg tcatgagatt 2580 atcaaaaagg atcttcacct agatcctttt aaattaaaaatgaagtttta aatcaatcta 2640 aagtatatat gagtaaactt ggtctgacag ttaccaatgcttaatcagtg aggcacctat 2700 ctcagcgatc tgtctatttc gttcatccat agttgcctgactccccgtcg tgtagataac 2760 tacgatacgg gagggcttac catctggccc cagtgctgcaatgataccgc gagacccacg 2820 ctcaccggct ccagatttat cagcaataaa ccagccagccggaagggccg agcgcagaag 2880 tggtcctgca actttatccg cctccatcca gtctattaattgttgccggg aagctagagt 2940 aagtagttcg ccagttaata gtttgcgcaa cgttgttgccattgctacag gcatcgtggt 3000 gtcacgctcg tcgtttggta tggcttcatt cagctccggttcccaacgat caaggcgagt 3060 tacatgatcc cccatgttgt gcaaaaaagc ggttagctccttcggtcctc cgatcgttgt 3120 cagaagtaag ttggccgcag tgttatcact catggttatggcagcactgc ataattctct 3180 tactgtcatg ccatccgtaa gatgcttttc tgtgactggtgagtactcaa ccaagtcatt 3240 ctgagaatag tgtatgcggc gaccgagttg ctcttgcccggcgtcaatac gggataatac 3300 cgcgccacat agcagaactt taaaagtgct catcattggaaaacgttctt cggggcgaaa 3360 actctcaagg atcttaccgc tgttgagatc cagttcgatgtaacccactc gtgcacccaa 3420 ctgatcttca gcatctttta ctttcaccag cgtttctgggtgagcaaaaa caggaaggca 3480 aaatgccgca aaaaagggaa taagggcgac acggaaatgttgaatactca tactcttcct 3540 ttttcaatat tattgaagca tttatcaggg ttattgtctcatgagcggat acatatttga 3600 atgtatttag aaaaataaac aaataggggt tccgcgcacatttccccgaa aagtgccacc 3660 tgacgcgccc tgtagcggcg cattaagcgc ggcgggtgtggtggttacgc gcagcgtgac 3720 cgctacactt gccagcgccc tagcgcccgc tcctttcgctttcttccctt cctttctcgc 3780 cacgttcgcc ggctttcccc gtcaagctct aaatcgggggctccctttag ggttccgatt 3840 tagtgcttta cggcacctcg accccaaaaa acttgattagggtgatggtt cacgtagtgg 3900 gccatcgccc tgatagacgg tttttcgccc tttgacgttggagtccacgt tctttaatag 3960 tggactcttg ttccaaactg gaacaacact caaccctatctcggtctatt cttttgattt 4020 ataagggatt ttgccgattt cggcctattg gttaaaaaatgagctgattt aacaaaaatt 4080 taacgcgaat tttaacaaaa tattaacgct tacaatttccattcgccatt caggctgcgc 4140 aactgttggg aagggcgatc ggtgcgggcc tcttcgctattacgccag 4188 4 11466 DNA Synthetic, Vector pYAC4-AscI misc_feature(3560)..(4247) Tetrahymena thermophila macronuclear telomere 4ttctcatgtt tgacagctta tcatcgataa gctttaatgc ggtagtttat cacagttaaa 60ttgctaacgc agtcaggcac cgtgtatgaa atctaacaat gcgctcatcg tcatcctcgg 120caccgtcacc ctggatgctg taggcatagg cttggttatg ccggtactgc cgggcctctt 180gcgggatatc gtccattccg acagcatcgc cagtcactat ggcgtgctgc tagcgctata 240tgcgttgatg caatttctat gcgcacccgt tctcggagca ctgtccgacc gctttggccg 300ccgcccagtc ctgctcgctt cgctacttgg agccactatc gactacgcga tcatggcgac 360cacacccgtc ctgtggatca attcccttta gtataaattt cactctgaac catcttggaa 420ggaccggtaa ttatttcaaa tctctttttc aattgtatat gtgttatgtt atgtagtata 480ctctttcttc aacaattaaa tactctcggt agccaagttg gtttaaggcg caagacttta 540atttatcact acggaattgg cgcgccaatt ccgtaatctt gagatcgggc gttcgatcgc 600cccgggagat ttttttgttt tttatgtctt ccattcactt cccagacttg caagttgaaa 660tatttctttc aagggaattg atcctctacg ccggacgcat cgtggccggc atcaccggcg 720ccacaggtgc ggttgctggc gcctatatcg ccgacatcac cgatggggaa gatcgggctc 780gccacttcgg gctcatgagc gcttgtttcg gcgtgggtat ggtggcaggc cccgtggccg 840ggggactgtt gggcgccatc tccttgcatg caccattcct tgcggcggcg gtgctcaacg 900gcctcaacct actactgggc tgcttcctaa tgcaggagtc gcataaggga gagcgtcgac 960cgatgccctt gagagccttc aacccagtca gctccttccg gtgggcgcgg ggcatgacta 1020tcgtcgccgc acttatgact gtcttcttta tcatgcaact cgtaggacag gtgccggcag 1080cgctctgggt cattttcggc gaggaccgct ttcgctggag cgcgacgatg atcggcctgt 1140cgcttgcggt attcggaatc ttgcacgccc tcgctcaagc cttcgtcact ggtcccgcca 1200ccaaacgttt cggcgagaag caggccatta tcgccggcat ggcggccgac gcgctgggct 1260acgtcttgct ggcgttcgcg acgcgaggct ggatggcctt ccccattatg attcttctcg 1320cttccggcgg catcgggatg cccgcgttgc aggccatgct gtccaggcag gtagatgacg 1380accatcaggg acagcttcaa ggatcgctcg cggctcttac cagcctaact tcgatcactg 1440gaccgctgat cgtcacggcg atttatgccg cctcggcgag cacatggaac gggttggcat 1500ggattgtagg cgccgcccta taccttgtct gcctccccgc gttgcgtcgc ggtgcatgga 1560gccgggccac ctcgacctga atggaagccg gcggcacctc gctaacggat tcaccactcc 1620aagaattgga gccaatcaat tcttgcggag aactgtgaat gcgcaaacca acccttggca 1680gaacatatcc atcgcgtccg ccatctccag cagccgcacg cggcgcatcc ccccccccct 1740ttcaattcaa ttcatcattt tttttttatt cttttttttg atttcggttt ctttgaaatt 1800tttttgattc ggtaatctcc gaacagaagg aagaacgaag gaaggagcac agacttagat 1860tggtatatat acgcatatgt agtgttgaag aaacatgaaa ttgcccagta ttcttaaccc 1920aactgcacag aacaaaaacc tgcaggaaac gaagataaat catgtcgaaa gctacatata 1980aggaacgtgc tgctactcat cctagtcctg ttgctgccaa gctatttaat atcatgcacg 2040aaaagcaaac aaacttgtgt gcttcattgg atgttcgtac caccaaggaa ttactggagt 2100tagttgaagc attaggtccc aaaatttgtt tactaaaaac acatgtggat atcttgactg 2160atttttccat ggagggcaca gttaagccgc taaaggcatt atccgccaag tacaattttt 2220tactcttcga agacagaaaa tttgctgaca ttggtaatac agtcaaattg cagtactctg 2280cgggtgtata cagaatagca gaatgggcag acattacgaa tgcacacggt gtggtgggcc 2340caggtattgt tagcggtttg aagcaggcgg cagaagaagt aacaaaggaa cctagaggcc 2400ttttgatgtt agcagaattg tcatgcaagg gctccctatc tactggagaa tatactaagg 2460gtactgttga cattgcgaag agcgacaaag attttgttat cggctttatt gctcaaagag 2520acatgggtgg aagagatgaa ggttacgatt ggttgattat gacacccggt gtgggtttag 2580atgacaaggg agacgcattg ggtcaacagt atagaaccgt ggatgatgtg gtctctacag 2640gatctgacat tattattgtt ggaagaggac tatttgcaaa gggaagggat gctaaggtag 2700agggtgaacg ttacagaaaa gcaggctggg aagcatattt gagaagatgc ggccagcaaa 2760actaaaaaac tgtattataa gtaaatgcat gtatactaaa ctcacaaatt agagcttcaa 2820tttaattata tcagttatta ctcgggcgta atgattttta taatgacgaa aaaaaaaaaa 2880ttggaaagaa aagggggggg gggcagcgtt gggtcctggc cacgggtgcg catgatcgtg 2940ctcctgtcgt tgaggacccg gctaggctgg cggggttgcc ttactggtta gcagaatgaa 3000tcaccgatac gcgagcgaac gtgaagcgac tgctgctgca aaacgtctgc gacctgagca 3060acaacatgaa tggtcttcgg tttccgtgtt tcgtaaagtc tggaaacgcg gaagtcagcg 3120ccctgcacca ttatgttccg gatctgcatc gcaggatgct gctggctacc ctgtggaaca 3180cctacatctg tattaacgaa gcgctggcat tgaccctgag tgatttttct ctggtcccgc 3240cgcatccata ccgccagttg tttaccctca caacgttcca gtaaccgggc atgttcatca 3300tcagtaaccc gtatcgtgag catcctctct cgtttcatcg gtatcattac ccccatgaac 3360agaaattccc ccttacacgg aggcatcaag tgaccaaaca ggaaaaaacc gcccttaaca 3420tggcccgctt tatcagaagc cagacattaa cgcttctgga gaaactcaac gagctggacg 3480cggatgaaca ggcagacatc tgtgaatcgc ttcacgacca cgctgatgag ctttaccgca 3540gccctcgagg gataagcttc atttttagat aaaatttatt aatcatcatt aatttcttga 3600aaaacatttt atttattgat cttttataac aaaaaaccct tctaaaagtt tatttttgaa 3660tgaaaaactt ataaaaattt atgaaaacta caaaaaataa aatttttaat taaaataatt 3720ttgataagaa cttcaatctt tgactagcta gcttagtcat ttttgagatt taattaatat 3780tttatgttta ttcatatata aactattcaa aatattatag aatttaaaca ttttaacatc 3840ttaatcattc ataaataact aaaaatcaaa gtattacatc aataaataac ttttactcaa 3900tgtcaaagaa ttattggggt tggggttggg gttggggttg gggttggggt tggggttggg 3960gttggggttg gggttggggt tggggttggg gttggggttg gggttggggt tggggttggg 4020gttggggttg gggttggggt tggggttggg gttggggttg gggttggggt tggggttggg 4080gttggggttg gggttggggt tggggttggg gttggggttg gggttggggt tggggttggg 4140gttggggttg gggttggggt tggggttggg gttggggttg gggtgggaaa acagcattca 4200ggtattagaa gaatatcctg attcaggtga aaatattgtt gatgcgcggg atcctcgggg 4260acaccaaata tggcgatctc ggccttttcg tttcttggag ctgggacatg tttgccatcg 4320atccatctac caccagaacg gccgttagat ctgctgccac cgttgtttcc accgaagaaa 4380ccaccgttgc cgtaaccacc acgacggttg ttgctaaaga agctgccacc gccacggcca 4440ccgttgtagc cgccgttgtt gttattgtag ttgctcatgt tatttctggc acttcttggt 4500tttcctctta agtgaggagg aacataacca ttctcgttgt tgtcgttgat gcttaaattt 4560tgcacttgtt cgctcagttc agccataata tgaaatgctt ttcttgttgt tcttacggaa 4620taccacttgc cacctatcac cacaactaac tttttcccgt tcctccatct cttttatatt 4680ttttttctcg atcgagttca agagaaaaaa aaagaaaaag caaaaagaaa aaaggaaagc 4740gcgcctcgtt cagaatgaca cgtatagaat gatgcattac cttgtcatct tcagtatcat 4800actgttcgta tacatactta ctgacattca taggtataca tatatacaca tgtatatata 4860tcgtatgctg cagctttaaa taatcggtgt cactacataa gaacaccttt ggtggaggga 4920acatcgttgg taccattggg cgaggtggct tctcttatgg caaccgcaag agccttgaac 4980gcactctcac tacggtgatg atcattcttg cctcgcagac aatcaacgtg gagggtaatt 5040ctgctagcct ctgcaaagct ttcaagaaaa tgcgggatca tctcgcaaga gagatctcct 5100actttctccc tttgcaaacc aagttcgaca actgcgtacg gcctgttcga aagatctacc 5160accgctctgg aaagtgcctc atccaaaggc gcaaatcctg atccaaacct ttttactcca 5220cgcgccagta gggcctcttt aaaagcttga ccgagagcaa tcccgcagtc ttcagtggtg 5280tgatggtcgt ctatgtgtaa gtcaccaatg cactcaacga ttagcgacca gccggaatgc 5340ttggccagag catgtatcat atggtccaga aaccctatac ctgtgtggac gttaatcact 5400tgcgattgtg tggcctgttc tgctactgct tctgcctctt tttctgggaa gatcgagtgc 5460tctatcgcta ggggaccacc ctttaaagag atcgcaatct gaatcttggt ttcatttgta 5520atacgcttta ctagggcttt ctgctctgtc atctttgcct tcgtttatct tgcctgctca 5580ttttttagta tattcttcga agaaatcaca ttactttata taatgtataa ttcattatgt 5640gataatgcca atcgctaaga aaaaaaaaga gtcatccgct aggtggaaaa aaaaaaatga 5700aaatcattac cgaggcataa aaaaatatag agtgtactag aggaggccaa gagtaataga 5760aaaagaaaat tgcgggaaag gactgtgtta tgacttccct gactaatgcc gtgttcaaac 5820gatacctggc agtgactcct agcgctcacc aagctcttaa aacgagaatt aagaaaaagt 5880cgtcatcttt cgataagttt ttcccacagc aaagcaatag tagaaaaaaa caatgggaaa 5940cgttgaatga agacaaagcg tcgtggttta aaaggaaata cgctcacgta catgctaggg 6000aacaggaccg tgcagcggat cccgcgcatc aacaatattt tcacctgaat caggatattc 6060ttctaatacc tgaatgctgt tttcccaccc caaccccaac cccaacccca accccaaccc 6120caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaaccc 6180caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaaccc 6240caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaaccc 6300caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaataa 6360ttctttgaca ttgagtaaaa gttatttatt gatgtaatac tttgattttt agttatttat 6420gaatgattaa gatgttaaaa tgtttaaatt ctataatatt ttgaatagtt tatatatgaa 6480taaacataaa atattaatta aatctcaaaa atgactaagc tagctagtca aagattgaag 6540ttcttatcaa aattatttta attaaaaatt ttattttttg tagttttcat aaatttttat 6600aagtttttca ttcaaaaata aacttttaga agggtttttt gttataaaag atcaataaat 6660aaaatgtttt tcaagaaatt aatgatgatt aataaatttt atctaaaaat gaagcttatc 6720cctcgagggc tgcctcgcgc gtttcggtga tgacggtgaa aacctctgac acatgcagct 6780cccggagacg gtcacagctt gtctgtaagc ggatgccggg agcagacaag cccgtcaggg 6840cgcgtcagcg ggtgttggcg ggtgtcgggg cgcagccatg acccagtcac gtagcgatag 6900cggagtgtat actggcttaa ctatgcggca tcagagcaga ttgtactgag agtgcaccat 6960atgcggtgtg aaataccgca cagatgcgta aggagaaaat accgcatcag gcgctcttcc 7020gcttcctcgc tcactgactc gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagct 7080cactcaaagg cggtaatacg gttatccaca gaatcagggg ataacgcagg aaagaacatg 7140tgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg ccgcgttgct ggcgtttttc 7200cataggctcc gcccccctga cgagcatcac aaaaatcgac gctcaagtca gaggtggcga 7260aacccgacag gactataaag ataccaggcg tttccccctg gaagctccct cgtgcgctct 7320cctgttccga ccctgccgct taccggatac ctgtccgcct ttctcccttc gggaagcgtg 7380gcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag 7440ctgggctgtg tgcacgaacc ccccgttcag cccgaccgct gcgccttatc cggtaactat 7500cgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc cactggtaac 7560aggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg gtggcctaac 7620tacggctaca ctagaaggac agtatttggt atctgcgctc tgctgaagcc agttaccttc 7680ggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag cggtggtttt 7740tttgtttgca agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc 7800ttttctacgg ggtctgacgc tcagtggaac gaaaactcac gttaagggat tttggtcatg 7860agattatcaa aaaggatctt cacctagatc cttttaaatt aaaaatgaag ttttaaatca 7920atctaaagta tatatgagta aacttggtct gacagttacc aatgcttaat cagtgaggca 7980cctatctcag cgatctgtct atttcgttca tccatagttg cctgactccc cgtcgtgtag 8040ataactacga tacgggaggg cttaccatct ggccccagtg ctgcaatgat accgcgagac 8100ccacgctcac cggctccaga tttatcagca ataaaccagc cagccggaag ggccgagcgc 8160agaagtggtc ctgcaacttt atccgcctcc atccagtcta ttaattgttg ccgggaagct 8220agagtaagta gttcgccagt taatagtttg cgcaacgttg ttgccattgc tgcaggcatc 8280gtggtgtcac gctcgtcgtt tggtatggct tcattcagct ccggttccca acgatcaagg 8340cgagttacat gatcccccat gttgtgcaaa aaagcggtta gctccttcgg tcctccgatc 8400gttgtcagaa gtaagttggc cgcagtgtta tcactcatgg ttatggcagc actgcataat 8460tctcttactg tcatgccatc cgtaagatgc ttttctgtga ctggtgagta ctcaaccaag 8520tcattctgag aatagtgtat gcggcgaccg agttgctctt gcccggcgtc aacacgggat 8580aataccgcgc cacatagcag aactttaaaa gtgctcatca ttggaaaacg ttcttcgggg 8640cgaaaactct caaggatctt accgctgttg agatccagtt cgatgtaacc cactcgtgca 8700cccaactgat cttcagcatc ttttactttc accagcgttt ctgggtgagc aaaaacagga 8760aggcaaaatg ccgcaaaaaa gggaataagg gcgacacgga aatgttgaat actcatactc 8820ttcctttttc aatattattg aagcatttat cagggttatt gtctcatgag cggatacata 8880tttgaatgta tttagaaaaa taaacaaata ggggttccgc gcacatttcc ccgaaaagtg 8940ccacctgacg tctaagaaac cattattatc atgacattaa cctataaaaa taggcgtatc 9000acgaggccct ttcgtcttca agaattaatt cggtcgaaaa aagaaaagga gagggccaag 9060agggagggca ttggtgacta ttgagcacgt gagtatacgt gattaagcac acaaaggcag 9120cttggagtat gtctgttatt aatttcacag gtagttctgg tccattggtg aaagtttgcg 9180gcttgcagag cacagaggcc gcagaatgtg ctctagattc cgatgctgac ttgctgggta 9240ttatatgtgt gcccaataga aagagaacaa ttgacccggt tattgcaagg aaaatttcaa 9300gtcttgtaaa agcatataaa aatagttcag gcactccgaa atacttggtt ggcgtgtttc 9360gtaatcaacc taaggaggat gttttggctc tggtcaatga ttacggcatt gatatcgtcc 9420aactgcatgg agatgagtcg tggcaagaat accaagagtt cctcggtttg ccagttatta 9480aaagactcgt atttccaaaa gactgcaaca tactactcag tgcagcttca cagaaacctc 9540attcgtttat tcccttgttt gattcagaag caggtgggac aggtgaactt ttggattgga 9600actcgatttc tgactgggtt ggaaggcaag agagccccga aagcttacat tttatgttag 9660ctggtggact gacgccagaa aatgttggtg atgcgcttag attaaatggc gttattggtg 9720ttgatgtaag cggaggtgtg gagacaaatg gtgtaaaaga ctctaacaaa atagcaaatt 9780tcgtcaaaaa tgctaagaaa taggttatta ctgagtagta tttatttaag tattgtttgt 9840gcacttgcct gcaggccttt tgaaaagcaa gcataaaaga tctaaacata aaatctgtaa 9900aataacaaga tgtaaagata atgctaaatc atttggcttt ttgattgatt gtacaggaaa 9960atatacatcg cagggggttg acttttacca tttcaccgca atggaatcaa acttgttgaa 10020gagaatgttc acaggcgcat acgctacaat gacccgattc ttgctagcct tttctcggtc 10080ttgcaaacaa ccgccggcag cttagtatat aaatacacat gtacatacct ctctccgtat 10140cctcgtaatc attttcttgt atttatcgtc ttttcgctgt aaaaacttta tcacacttat 10200ctcaaataca cttattaacc gcttttacta ttatcttcta cgctgacagt aatatcaaac 10260agtgacacat attaaacaca gtggtttctt tgcataaaca ccatcagcct caagtcgtca 10320agtaaagatt tcgtgttcat gcagatagat aacaatctat atgttgataa ttagcgttgc 10380ctcatcaatg cgagatccgt ttaaccggac cctagtgcac ttaccccacg ttcggtccac 10440tgtgtgccga acatgctcct tcactatttt aacatgtgga attaattcta aatcctcttt 10500atatgatctg ccgatagata gttctaagtc attgaggttc atcaacaatt ggattttctg 10560tttactcgac ttcaggtaaa tgaaatgaga tgatacttgc ttatctcata gttaactcta 10620agaggtgata cttatttact gtaaaactgt gacgataaaa ccggaaggaa gaataagaaa 10680actcgaactg atctataatg cctattttct gtaaagagtt taagctatga aagcctcggc 10740attttggccg ctcctaggta gtgctttttt tccaaggaca aaacagtttc tttttcttga 10800gcaggtttta tgtttcggta atcataaaca ataaataaat tatttcattt atgtttaaaa 10860ataaaaaata aaaaagtatt ttaaattttt aaaaaagttg attataagca tgtgaccttt 10920tgcaagcaat taaattttgc aatttgtgat tttaggcaaa agttacaatt tctggctcgt 10980gtaatatatg tatgctaaag tgaactttta caaagtcgat atggacttag tcaaaagaaa 11040ttttcttaaa aatatatagc actagccaat ttagcacttc tttatgagat atattataga 11100ctttattaag ccagatttgt gtattatatg tatttacccg gcgaatcatg gacatacatt 11160ctgaaatagg taatattctc tatggtgaga cagcatagat aacctaggat acaagttaaa 11220agctagtact gttttgcagt aatttttttc ttttttataa gaatgttacc acctaaataa 11280gttataaagt caatagttaa gtttgatatt tgattgtaaa ataccgtaat atatttgcat 11340gatcaaaagg ctcaatgttg actagccagc atgtcaacca ctatattgat caccgatata 11400tggacttcca caccaactag taatatgaca ataaattcaa gatattcttc atgagaatgg 11460cccaga 11466

1. A nucleotide concatemer comprising in the 5′→3′ direction a cassetteof nucleotide sequence of the general formula[rs₂-SP—PR—X-TR—SP-rs₁]_(n) wherein rs₁ and rs₂ together denote afunctional restriction site, SP individually denotes a spacer of atleast two nucleotide bases, PR denotes a promoter, capable offunctioning in a cell, X denotes an expressible nucleotide sequence, TRdenotes a terminator, and SP individually denotes a spacer of at leasttwo nucleotide bases, and n≧2, and wherein at least a first cassette isdifferent from a second cassette.
 2. The concatemer according to claim1, wherein the nucleotide sequence comprises a DNA sequence selectedfrom the group comprising cDNA, genomic DNA.
 3. The concatemer accordingto claim 1, wherein the nucleotide sequence is single stranded, orpartly single stranded.
 4. The concatemer according to claim 1, whereinthe nucleotide sequence is double stranded.
 5. The concatemer accordingto any of the preceding claims 1 to 4, comprising nucleotide sequencesfrom at least one expression state.
 6. The concatemer according to anyof the preceding claims 1 to 5, comprising nucleotide sequences from atleast two expression states.
 7. The concatemer according to any of thepreceding claims 1 to 6, wherein the rs₁-rs₂ restriction site of atleast two cassettes are recognised by the same restriction enzyme, morepreferably are identical.
 8. The concatemer according to claim 7,wherein the rs₁-rs₂ restriction site of essentially all cassettes arerecognised by the same restriction enzyme, more preferably areidentical.
 9. The concatemer according to any of the preceding claims 1to 8, wherein substantially all cassettes are different.
 10. Theconcatemer according to any of claims 1 to 9, wherein at least onecassette comprises an intron between the promoter and the expressiblenucleotide sequence, more preferably substantially all cassettescomprise an intron between the promoter and the expressible nucleotidesequence.
 11. The concatemer according to any of the preceding claims 1to 10, wherein the difference comprises different promoters, and/ordifferent expressible nucleotide sequences, and/or different spacersand/or different terminators and/or different introns.
 12. Theconcatemer according to any of the preceding claims 1 to 11, wherein nis at least 10, such as at least 15, for example at least 20, such as atleast 25, for example at least 30, such as from 30 to 60 or more than60, such as at least 75, for example at least 100, such as at least 200,for example at least 500, such as at least 750, for example at least1000, such as at least 1500, for example at least
 2000. 13. Theconcatemer according to any of the preceding claims 1 to 12, wherein atleast one cassette comprise the cassette from a primary vector accordingto claims 63 to 98, more preferably substantially all cassettes comprisethe cassette from a primary vector according to claims 63 to
 98. 14. Theconcatemer according to any of the preceding claims 1 to 13, comprisedin an artificial chromosome.
 15. The concatemer according to claim 14,wherein the artificial chromosome is selected from the group comprisinga Yeast Artificial Chromosome, a mega Yeast Artificial Chromosome, aBacterial Artificial Chromosome, a mouse artificial chromosome, aMammalian Artificial Chromosome, an Insect Artificial Chromosome, anAvian Artificial Chromosome, a Bacteriophage Artificial Chromosome, aBaculovirus Artificial Chromosome, or a Human Artificial Chromosome. 16.The concatemer according to any of the preceding claims 1 to 13,comprised in a plasmid or an insertion vector, such as for example yeastintegrative plasmid (YIp), Yeast replicating plasmid (YRp), YeastEpisomal plasmid (YEp), Yeast centromeric plasmid (YCp), Yeast linearplasmid (YLp), Yeast expression plasmid (YXp), Yeast retrotransposons(Ty elements), Yeast killer plasmid, Yeast disintegration plasmid (YDp).17. The concatemer according to any of claims 134 to 16, wherein thevector further comprises at least one selectable genetic marker, such asa repressive or a dominant marker.
 18. The concatemer according to claim17, comprising two selectable genetic markers.
 19. The concatemeraccording to claim 17 or 18, wherein the marker comprises a markerselected from the group comprising LEU 2, TRP 1, HIS 3, LYS 2, URA 3,ADE 2, Amyloglucosidase, β-lactamase, CUP 1, G418^(R), TUN^(R), KiLk1,C230, SMR1, SFA, Hygromycin^(R), methotrexate^(R), chloramphenicol^(R),Diuron^(R), Zeocin^(R), Canavanine^(R).
 20. The concatemer according toany of claims 1 to 19, wherein different expressible nucleotidesequences come from the same or from different expression states. 21.The concatemer according to claim 20, wherein the different expressionstates represent at least two different tissues, such as at least twoorgans, such as at least two species, such as at least two genera. 22.The concatemer according to claim 21, wherein the different species arefrom at least two different phylae, such as from at least two differentclasses, such as from at least two different divisions, more preferablyfrom at least two different sub-kingdoms, such as from at least twodifferent kingdoms.
 23. The concatemer according to claim 21, whereinone species is a eukaryot and another species is a prokaryot.
 24. Theconcatemer according to any of the preceding claims 1 to 23, beingdesigned to minimise the level of repeat sequences occurring in theconcatemer.
 25. A method for concatenation comprising the steps ofconcatenating at least two cassettes of nucleotide sequences eachcassette comprising a first sticky end, a spacer sequence, a promoter,an expressible nucleotide sequence, a terminator, a spacer sequence, anda second sticky end.
 26. The method according to claim 25, furthercomprising starting from a primary vector[RS1-RS2-SP—PR—X-TR—SP—RS2′-RS1′], wherein X denotes an expressiblenucleotide sequence, RS1 and RS1′ denote restriction sites, RS2 and RS2′denote restriction sites different from RS1 and RS1′, SP individuallydenotes a spacer sequence of at least two nucleotides, PR denotes apromoter, TR denotes a terminator, iii) cutting the primary vector withthe aid of at least one restriction enzyme specific for RS2 and RS2′obtaining cassettes having the general formula [rs₂-SP—PR—X-TR—SP-rs₁]wherein rs₁ and rs₂ together denote a functional restriction site RS2 orRS2′, iv) assembling the cut out cassettes through interaction betweenrs₁ and rs₂.
 27. The method according to claim 25 or 26, comprisingconcatenating at least at least 10 cassettes, such as at least 15, forexample at least 20, such as at least 25, for example at least 30, suchas from 30 to 60 or more than 60, such as at least 75, for example atleast 100, such as at least 200, for example at least 500, such as atleast 750, for example at least 1000, such as at least 1500, for exampleat least
 2000. 28. The method according to claim 26, further comprisingaddition of vector arms each having a RS2 or RS2′ in one end and anon-complementary overhang or a blunt end in the other end.
 29. Themethod according to claim 27, whereby the ratio of vector arms tocassettes determines the number of cassettes in the concatemer.
 30. Themethod according to claim 27 or 29, wherein the vector arms areartificial chromosome vector arms.
 31. The method according to claim 26,further comprising addition of stopper fragments, the stopper fragmentseach having a RS2 or RS2′ in one end and a non-complementary overhang ora blunt end in the other end.
 32. The method according to claim 31,further comprising ligating vector arms to the stopper fragments. 33.The method according to claim 26, further comprising iv) isolating mRNAfrom an expression state, v) obtaining substantially full length cDNAclones corresponding to the mRNA sequences, vi) inserting thesubstantially full length cDNA clones into a cloning site in a cassettein a primary vector, said cassette being of the general formula in 5′→3′direction: [RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′] wherein CS denotes acloning site.
 34. The method according to claim 26, wherein RS1 and RS1′are restriction sites leaving blunt ends, and RS2 and RS2′ arerestriction sites leaving compatible sticky ends.
 35. The methodaccording to claim 26, wherein RS1 and RS1′ are identical, and whereinRS2 and RS2′ are identical.
 36. The method according to claim 26,wherein RS2 and RS2′ have palindromic overhangs.
 37. The methodaccording to claim 26, wherein RS2 and RS2′ have non-palindromicoverhangs.
 38. The method according to any of the preceding claims 25 to377, further comprising selection of vectors having expressiblenucleotide sequences from at least two different expression states, suchas from two different species.
 39. The method according the claim 388,whereby the two different species are from two different classes, suchas from two different divisions, more preferably from two differentsub-kingdoms, such as from two different kingdoms.
 40. The methodaccording to any of the claims 25 to 39, whereby the concatemer isligated into an artificial chromosome selected from the group comprisingyeast artificial chromosome, mega yeast artificial chromosome, bacterialartificial chromosome, mouse artificial chromosome, human artificialchromosome.
 41. The method according to any of the preceding claims 26to 400, whereby RS2. and RS2′ in at least two cassettes are cleaved byone restriction enzyme, preferably RS2 and RS2′ in substantially allcassettes are cleaved by one restriction enzyme.
 42. A cell comprisingat least one concatemer of individual oligonucleotide cassettes, eachconcatemer comprising oligonucleotide of the following formula in 5′→3′direction: [rs₂-SP—PR—X-TR—SP-rs₁]_(n) wherein rs₁ and rs₂ togetherdenote a restriction site, SP individually denotes a spacer of at leasttwo nucleotide bases, PR denotes a promoter, capable of functioning inthe cell, X denotes an expressible nucleotide sequence, TR denotes aterminator, and SP individually denotes a spacer of at least twonucleotide bases, wherein n≧2, and wherein at least two expressiblenucleotide sequences are from different expression states.
 43. A cellcomprising at least one concatemer of individual oligonucleotidecassettes, each concatemer comprising oligonucleotide of the followingformula in 5′→3′ direction: [rs2-SP—PR—X-TR—SP-rs₁]_(n) wherein rs₁ andrs₂ together denote a restriction site, SP individually denotes a spacerof at least two nucleotide bases, PR denotes a promoter, capable offunctioning in the cell, X denotes an expressible nucleotide sequence,TR denotes a terminator, and SP individually denotes a spacer of atleast two nucleotide bases, wherein n≧2, and wherein rs₁-rs₂ in at leasttwo cassettes is recognised by the same restriction enzyme.
 44. The cellaccording to claim 422 or 433, wherein substantially all rs₁-rs₂sequences are recognised by the same restriction enzyme, more preferablywherein substantially all rs₁-rs₂ sequences are substantially identical.45. The cell according to any of claims 422 to 444, wherein n is atleast 10, such as at least 15, for example at least 20, such as at least25, for example at least 30, such as from 30 to 60 or more than 60, suchas at least 75, for example at least 100, such as at least 200, forexample at least 500, such as at least 750, for example at least 1000,such as at least 1500, for example at least
 2000. 46. The cell accordingto any of claims 422 to 455, comprising 2 concatemers per cell, forexample 3 per cell, such as at least 4 per cell.
 47. The cell accordingto any of claims 422 to 466, wherein at least one cassette comprises anintron between the promoter and the expressible nucleotide sequence,more preferably substantially all cassettes comprise an intron betweenthe promoter and the expressible nucleotide sequence.
 48. The cellaccording to any of claims 433 to 477, comprising a eukaryotic cellselected from the group comprising: yeasts; filamentous ascomycetes suchas Neurospora crassa and Aspergillus nidulans; plant cells such as thosederived from Nicotiana and Arabidopsis; mammalian host cells such asthose derived from humans, monkeys and rodents, such as chinese hamsterovary (CHO) cells, NIH/3T3, COS, 293, VERO, HeLa.
 49. The cell accordingto claim 488, being a yeast cell selected from the group comprisingbaker's yeast, Kluyveromyces marxianus, K. lactis, Candida utilis,Phaffia rhodozyma, Saccharomyces boulardii, Pichia pastoris, Hansenulapolymorpha, Yarrowia lipolytica, Candida paraffinica, Schwanniomycescastellii, Pichia stipitis, Candida shehatae, Rhodotorula glutinis,Lipomyces lipofer, Cryptococcos curvatus, Candida spp. (e.g. C.palmioleophila), Yarrowia lipolytica, Candida guilliermondii, Candida,Rhodotorula spp., Saccharomycopsis spp., Aureobasidium pullulans,Candida brumptii, Candida hydrocarbofumarica, Torulopsis, Candidatropicalis, Saccharomyces cerevisiae, Rhodotorula rubra, Candidaflaveri, Eremothecium ashbyii, Pichia spp., Kluyveromyces, Hansenula,Kloeckera, Pichia, Pachysolen spp., or Torulopsis bornbicola.
 50. Thecell according to any of the preceding claims 433 to 49, having amutation in a central biosynthetic pathway.
 51. The cell according toclaim 500, comprising an inserted selectable genetic markercomplementing the mutation.
 52. The cell according to any of thepreceding claims 433 to 511, comprising a selectable genetic marker. 53.The cell according to any of claims 433 to 522, wherein the nucleotidesequence of at least one concatemer, preferably the nucleotide sequencefrom substantially all concatemers have been designed to minimise thelevel of repeat sequences in any one concatemer.
 54. The cell accordingto claim 533, wherein recombination within the expressible nucleotidesequence has been minimised.
 55. The cell according to any of thepreceding claims 433 to 544, wherein at least one concatemer, preferablysubstantially all concatemers is/are concatemer/s according to claims 1to
 24. 56. A method for producing a transgenic cell comprising insertinginto a host cell a concatemer comprising a heterologous nucleotidesequence comprising at least two genes each controlled by a promoter,wherein the two promoters are different.
 57. The method according toclaim 566, whereby the inserted genes come from at least two differentexpression states.
 58. The method according to claim 57, whereby theexpression states are comprised in different species.
 59. The methodaccording to claim 588, whereby the different species are comprised indifferent kingdoms.
 60. The method according to any of the precedingclaims 566 to 59, comprising insertion of a concatemer according toclaims 1 to
 24. 61. The method according to any of the preceding claims566 to 600, further comprising selecting for cells comprising at leastone stably maintained concatemer,
 62. The method according to claim 611,whereby selection comprises selection of cells carrying at least oneselectable genetic marker on an artificial chromosome, more preferablytwo selectable genetic markers on an artificial chromosome.
 63. Aprimary vector comprising a nucleotide sequence cassette of the generalformula in 5′→3′ direction: [RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′] whereinRS1 and RS1′ denote restriction sites, RS2 and RS2′ denotes restrictionsites different from RS1 and RS1′, SP individually denotes a spacersequence of at least two nucleotides, PR denotes a promoter, CS denotesa cloning site, TR denotes a terminator.
 64. The vector according toclaim 633, wherein the nucleotide sequence is a DNA sequence.
 65. Thevector according to claim 633, wherein the nucleotide sequence is doublestranded.
 66. The vector according to any of the preceding claims 633 to655, further comprising an intron sequence between the promoter and thecloning site and/or between the cloning site and the terminator.
 67. Thevector according to any of the preceding claims 633 to 666, wherein thecloning site comprises an expressible nucleotide sequence.
 68. Thevector according to claim 677, wherein in the expressible nucleotidesequence comprises substantially full length cDNA.
 69. The vectoraccording to claim 677, wherein the expressible nucleotide sequencecomprises genomic DNA.
 70. The vector according to any of the precedingclaims 633 to 69, wherein any of RS1, RS1′, RS2, RS2′ comprise a rarerestriction site selected from those of Example
 6. 71. The vectoraccording to claim 700, wherein the recognition sequence for RS1, RS1′,RS2 and/or RS2′ comprise at least 6 bases such as at least 8 bases, forexample at least 10 bases.
 72. The vector according to claim 711,wherein the recognition sequence comprises a bipartite sequence.
 73. Thevector according to claim 711, wherein the GC content of the recognitionsequence is more than 40%, preferably more than 50%, more preferablyequal to or more than 60%.
 74. The vector according to any of thepreceding claims 633 to 733, wherein the restriction enzyme recognisingRS2 and RS2′ produces sticky ends upon cleavage of a double strandednucleotide sequence, preferably wherein the sticky ends have apre-determined nucleotide sequence.
 75. The vector according to any ofthe preceding claims 633 to 744, wherein RS2 and RS2′ are identical. 76.The method according to claim 755, wherein the RS2 and/or RS2′ overhangis a palindromic sequence.
 77. The method according to claim 755,wherein the RS2 and/or RS2′ overhang is a non-palindromic sequence. 78.The vector according to any of the preceding claims 633 to 755, whereinthe restriction enzyme recognising RS1 and RS1′ produces blunt ends uponcleavage of a double stranded nucleotide sequence
 79. The vectoraccording to any of the preceding claims 633 to 755, wherein therestriction enzyme recognising RS1 and RS1′ produces sticky ends with anucleotide sequence being non-compatible with the nucleotide sequence ofsticky ends produced upon cleavage of RS2 and RS2′.
 80. The vectoraccording to any of the preceding claims 633 to 79, wherein RS1 and RS1′are identical.
 81. The vector according to any of the preceding claims633 to 800, further comprising a spacer sequence between TR and RS2′.82. The vector according to any of the preceding claims 633 to 811,wherein the spacer and the optional spacer sequence together comprise atleast 100 bases, such as at least 250 bases, such as at least 500 bases,such as at least 750 bases, for example at least 1000 bases, such as atleast 1100 bases, for example at least 1200 bases, such as at least 1300bases, for example at least 1400 bases, such as at least 1500 bases, forexample at least 1600 bases, such as at least 1700 bases, for example atleast 1800 bases, such as at least 1900 bases, for example at least 2000bases, such as at least 2100 bases, for example at least 2200 bases,such as at least 2300 bases, for example at least 2400 bases, such as atleast 2500 bases, for example at least 2600 bases, such as at least 2700bases, for example at least 2800 bases, such as at least 2900 bases, forexample at least 3000 bases, such as at least 3200 bases, for example atleast 3500 bases, such as at least 3800 bases, for example at least 4000bases, such as at least 4500 bases, for example at least 5000 bases,such as at least 6000 bases.
 83. The vector according to claims 811 or822, wherein at least one of the spacer sequences comprises between 100and 2500 bases, preferably between 200 and 2300 bases, more preferablybetween 300 and 2100 bases, such as between 400 and 1900 bases, morepreferably between 500 and 1700 bases, such as between 600 and 1500bases, more preferably between 700 and 1400 bases.
 84. The vectoraccording to any of the preceding claims 633 to 833, wherein thepromoter is an externally controllable promoter.
 85. The vectoraccording to any of the preceding claims 633 to 844, wherein thepromoter comprises an inducible promoter or wherein the promotercomprises a repressible promoter.
 86. The vector according to any of thepreceding claims 633 to 855, wherein the promoter comprises bothrepressible and inducible elements.
 87. The vector according to any ofthe preceding claims 633 to 866, wherein the promoter is chemicallyinducible and/or repressible and/or inducible/repressible bytemperature.
 88. The vector according to any of the preceding claims 633to 877, wherein the promoter is induced and/or repressed by any factorselected from the group comprising carbohydrates, e.g. galactose; lowinorganic phosphase levels; temperature, e.g. low or high temperatureshift; metals or metal ions, e.g. copper ions; hormones, e.g.dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39° C.);methanol; redox-status; growth stage, e.g. developmental stage;synthetic inducers, e.g. the gal inducer.
 89. The vector according toany of the preceding claims 633 to 888, wherein the promoter comprises apromoter selected from the group comprising ADH 1, PGK 1, GAP 491, TPI,PYK, ENO, PMA 1, PHO5, GAL 1, GAL 2, GAL 10, MET25, ADH2, MEL 1, CUP 1,HSE, AOX, MOX, SV40, CaMV, Opaque-2, GRE, ARE, PGK/ARE hybrid, CYC/GREhybrid, TPI/α2 operator, AOX 1, MOX A.
 90. The vector according to claim889, wherein the promoter is selected from hybrid promoters includingPGK/ARE hybrid, CYC/GRE hybrid.
 91. The vector according to any of thepreceding claims 633 to 900, wherein the promoter is a syntheticpromoter.
 92. The vector according to any of the preceding claims 633 to911, wherein the cloning site allows directional cloning.
 93. The vectoraccording to any of the preceding claims 633 to 922, wherein the cloningsite comprises multiple coning sites, such as a polylinker site, thecloning site preferably encoding a series of restriction endonucleaserecognition sites.
 94. The vector according to any of the precedingclaims 633 to 933, wherein the promoter and terminator are capable offunctioning in an expression host cell, preferably in a yeast cell. 95.The vector according to any of the preceding claims 633 to 944, whereinthe primary vector comprising the cassette is a plasmid vector having ahigh copy number, being capable of being propagated in E. coli, andhaving a selectable marker for maintenance in E. coli.
 96. The vectoraccording to claim 955, wherein the primary vector can be madesingle-stranded.
 97. The vector according to claim 966, furthercomprising an origin of replication in the vector backbone, preferablyan origin of replication for filamentous phages, more preferably the f1origin of replication.
 98. The vector according to claim 955, whereinthe primary vector is selected from the group comprising pBR322, pUC18,pUC19, pUC118, pUC119, pEMBL, pRSA101, pBluescript.
 99. The vectoraccording to claim 633, as defined by any of the sequences SEQ ID NO 1to
 3. 100. A method of preparing a primary vector comprising insertingan expressible nucleotide sequence into a cloning site in a primaryvector comprising a cassette, the cassette comprising a nucleotidesequence of the general formula in 5′→3′ direction:[RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′] wherein RS1 and RS1′ denoterestriction sites, RS2 and RS2′ denotes restriction sites different fromRS1 and RS1′, SP individually denotes a spacer sequence of at least twonucleotides, PR denotes a promoter, CS denotes a cloning site, TRdenotes a terminator.
 101. The method according to claim 1000, whereinthe expressible nucleotide sequences comprise genomic DNA.
 102. Themethod according to claim 1000, further comprising isolating total mRNAfrom an expression state, and obtaining full length cDNA for insertioninto the vector.
 103. The method according to claim 1022, furthercomprising selection of cDNA to obtain substantially full length cDNA.104. The method according to any of the preceding claims 1000 to 1033,whereby the insertion into the primary vector comprises directionalcloning.
 105. The method according to any of the preceding claims 1000to 1044, whereby a substantially full length cDNA population comprises anormalised represenation of cDNA species.
 106. The method according toany of the preceding claims 1000 to 1055, whereby a substantially fulllength cDNA population comprises an ormalised representation of cDNAspecies characteristic of a given expression state.
 107. A nucleotidelibrary comprising at least two primary vectors each vector comprising anucleotide sequence cassette of the general formula in 5′→3′ direction:[RS1-RS2-SP—PR—X-TR—SP—RS2′-RS1′] wherein RS1 and RS1′ denoterestriction sites, RS2 and RS2′ denote restriction sites different fromRS1 and RS1′, SP individually denotes a spacer sequence of at least twonucleotides, PR denotes a promoter, X denotes an expressible nucleotidesequence, TR denotes a terminator. wherein the expressible nucleotidesequences are isolated from at least one expression state, and whereinat least a first and a second primary vector comprise an expressiblenucleotide sequence coding for the same peptide under the control of twodifferent promoters in said first and second primary vector.
 108. Thelibrary according to claim 1077, wherein at least three primary vectorscomprise an expressible nucleotide sequence coding for the same peptideunder the control of three different promoters.
 109. The libraryaccording to claim 1077, wherein at least four primary vectors comprisean expressible nucleotide sequence coding for the same peptide under thecontrol of four different promoters.
 110. The library according to claim1077, wherein at least five primary vectors comprise an expressiblenucleotide sequence coding for the same peptide under the control offive different promoters, such as wherein at least six primary vectorscomprise an expressible nucleotide sequence coding for the same peptideunder the control of six different promoters, for example wherein atleast seven primary vectors comprise an expressible nucleotide sequencecoding for the same peptide under the control of seven differentpromoters, for example wherein at least eight primary vectors comprisean expressible nucleotide sequence coding for the same peptide under thecontrol of eight different promoters, such as wherein at least nineprimary vectors comprise an expressible nucleotide sequence coding forthe same peptide under the control of nine different promoters, forexample wherein at least ten primary vectors comprise an expressiblenucleotide sequence coding for the same peptide under the control of tendifferent promoters.
 111. The library according to any of the precedingclaims 1077 to 1100, wherein the expressible nucleotide sequence codingfor the same peptide comprises essentially the same nucleotide sequence,more preferably the same nucleotide sequence.
 112. The library accordingto any of the preceding claims 1077 to 1111, being maintained in a hostcell capable of maintaining the vectors comprising the cassettessubstantially unaltered.
 113. The library according to claim 1122,wherein the host cell is selected from the group comprising bacteriasuch as E. coli or Bacillus subtilis, or fungi such as yeast.
 114. Thelibrary according to any of claims 1077 to 1133, wherein the promotersare not functional in the library host.
 115. The library according toany of the preceding claims 1077 to 1144, wherein RS2 and RS2′ areidentical.
 116. The library according to claim 1155, wherein at leasttwo vectors comprise the same RS2 and RS2′ sequence.
 117. The libraryaccording to claim 1166, wherein substantially all vectors comprise thesame RS2 and RS2′ sequence.
 118. The library according to any of thepreceding claims 1077 to 1177, comprising at least one primary vectoraccording to claims 633 to
 988. 119. A method for preparing a nucleotidelibrary comprising obtaining expressible nucleotide sequences, cloningthe expressible nucleotide sequences into cloning sites of a mixture ofprimary vectors, the primary vectors comprising a cassette, thecassettes comprising a nucleotide sequence of the general formula in5′→3′ direction: [RS1-RS2-SP—PR—CS-TR—SP—RS2′-RS1′] wherein RS1 and RS1′denote restriction sites, RS2 and RS2′ denote restriction sitesdifferent from RS1 and RS1′, SP individually denotes a spacer sequenceof at least two nucleotides, PR denotes a promoter, CS denotes a cloningsite, TR denotes a terminator, and transferring the primary vectors intoa host cell.
 120. The method according to claim 11919, whereby theexpressible nucleotide sequences comprises cDNA, and/or genomic DNA.121. The method according to claim 11919, whereby the expressiblenucleotide sequences are obtained from a cDNA library.
 122. The methodaccording to any of the claims 11919 to 1211, wherein the expressiblenucleotide sequences are representative of an expression state.